Adding dimensional data through code

If you are familiar with Python, you can add dimension data to existing device types through code.

About this task

You can add dimensions and dimension values programmatically to existing device types.

You can develop a custom function based on the sample custom function to add dimensions. The sample function uses the Preload base class of IoT Functions to add dimensions. Each time that you run the function, you delete and re-create the dimensions and their values. If several device types share dimensions, you can apply the custom function to each device type to set the dimensions.

Important

Complete the steps in Tutorial: Add a custom function to learn how to create and register a custom function. Refer to the tutorial for information about the following tasks:

If you want to simulate dimensions for a device type, you can use the sample script to add the dimensions to a device type. In the script, use the make_dimenion and generate_dimension_data methods for a device type to simulate dimension data.

Adding dimensions by using a custom function

Create a custom function that adds a dimension table to the database and assigns values to the dimensions. Use the following sample code as a template for your function. You can extend the custom function to assign the dimension values. For example, you might extend the function to load dimension values from a CSV file.

The sample function has two classes. Both classes add a dimension table and assign values to the dimension per device ID.

Sample custom function

import logging
import pandas as pd
import numpy as np


from iotfunctions.base import BasePreload
from iotfunctions import ui

from sqlalchemy import Column, Integer, String, Float, DateTime, Boolean, func

logger = logging.getLogger(__name__)

# Specify the URL to your package here.
# This URL must be accessible via pip install.
# Example assumes the repository is private.
# Replace XXXXXX with your personal access token.

PACKAGE_URL = 'git+https://XXXXXX@github.com/<user_id><path_to_repository>@<package'
# If your code is hosted in GitLab, use the format 'git+https://<deploy_token_username>:<deploy_token_password>@gitlab.com/<user_id><path_to_repository>@prod'

class SampleDimensionPreload_random(BasePreload):
    '''
    Create a dimension (dimension_1) and add random values to it.
    The function deletes the old values and reloads the dimension table.
    '''

    dim_table_name = None

    def __init__(self, output_item ='dimension_preload_done'):

        super().__init__(dummy_items=[],output_item = output_item)


    def execute(self, df, start_ts = None,end_ts=None,entities=None):
        '''
        The function uses the preload class to create and load dimension data
        and randomly assign test data.
        '''
        entity_type = self.get_entity_type()
        self.db = entity_type.db
        schema = entity_type._db_schema

        # Get the dimension table name and add dimension values to the table.
        try:
            self.dim_table_name = (entity_type.get_attributes_dict()['_dimension_table_name']).upper()
        except:
            self.dim_table_name = entity_type.logical_name + '_DIMENSION'

        msg = 'Dimension table name: ' + str(self.dim_table_name)
        logging.debug(msg)

        # Read the device ID of each device and add the dimension to each device ID.  
        ef = self.db.read_table(entity_type.logical_name, schema=schema, columns=[entity_type._entity_id])
        ids = set(ef[entity_type._entity_id].unique())
        msg = 'entities with device_id present:' + str(ids)
        logger.debug(msg)

        # Note: 
        # The make_dimension method only adds a new dimension table.
        # The old dimension table and all of the data in it is deleted.
        self.db.drop_table(self.dim_table_name, schema=schema)
        # Make a dimension table (dimension_1)
        entity_type.make_dimension(self.dim_table_name,
                                   Column('dimension_1', String(50)), # add dimension_1
                                   **{'schema': schema})
        # Register the device to add the dimension to its metadata. 
        entity_type.register()
        '''
        Randomly generating dimension data.
            Generate random data for dimension_1
            and for those device IDs that don't already have data.
        '''
        entity_type.generate_dimension_data(entities=ids)

        return True

    @classmethod
    def build_ui(cls):
        '''
        Register metadata
        '''
        # Define arguments that behave as function inputs
        inputs = []
        # Define arguments that behave as function outputs
        outputs=[]
        outputs.append(ui.UIStatusFlag(name='output_item'))
        return (inputs, outputs)


class SampleDimensionPreload_preset(BasePreload):
    '''
    Create a dimension (dimension_1) and add temporary values to it.
    The function assigns the string preload_value as the value.
    The function deletes the old values and reloads the dimension table. 
    '''

    dim_table_name = None

    def __init__(self, output_item='dimension_preload_done'):

        super().__init__(dummy_items=[], output_item=output_item)

    def execute(self, df, start_ts=None, end_ts=None, entities=None):
        '''
        The function uses the Preload class to create and load dimension data.
        It assigns preset values.
        '''
        entity_type = self.get_entity_type()
        self.db = entity_type.db
        schema = entity_type._db_schema

        # Get the dimension table name and add dimension values to the table.
        try:
            self.dim_table_name = (entity_type.get_attributes_dict()['_dimension_table_name']).upper()
        except:
            self.dim_table_name = entity_type.logical_name + '_DIMENSION'

        msg = 'Dimension table name: ' + str(self.dim_table_name)
        logging.debug(msg)

        # Read the device ID of each device and add the dimension to each device ID. 
        ef = self.db.read_table(entity_type.logical_name, schema=schema, columns=[entity_type._entity_id])
        ids = set(ef[entity_type._entity_id].unique())
        msg = 'entities with device_id present:' + str(ids)
        logger.debug(msg)

        # Note: 
        # The make_dimension method adds a new dimension table.
        # The old dimension table and all of the data in it is deleted.
        self.db.drop_table(self.dim_table_name, schema=schema)
        # Make a dimension table.
        entity_type.make_dimension(self.dim_table_name,
                                   Column('dimension_1', String(50)),  # add dimension_1
                                   **{'schema': schema})
        # Register the device to add the dimension to its metadata. 
        entity_type.register()

        '''
         Preload dimensional data with preset values.
            These values are hardcoded but can be extended by
            loading data from a CSV file or through a HTTP request.  
            In this template, the value is hardcoded to 'preload_value'.
        '''
        # Create hardcoded data
        preload_data = {}
        preload_values = np.repeat('preload_value', len(ids))
        preload_data['dimension_1'] = preload_values
        preload_data[entity_type._entity_id] = list(ids)
        df = pd.DataFrame(preload_data)
        '''
        # Load the hardcoded data into the database.
        '''
        msg = 'Setting columns for dimensional table\n'
        required_cols = self.db.get_column_names(table=self.dim_table_name, schema=schema)
        missing_cols = list(set(required_cols) - set(df.columns))
        msg = msg + 'required_cols ' + str(required_cols) + '\n'
        msg = msg + 'missing_cols ' + str(missing_cols) + '\n'
        logger.debug(msg)

        # Write the dataframe for dimension to the IBM IOT Platform database table.
        self.write_frame(df=df, table_name=self.dim_table_name, if_exists='append')

        kwargs = {
            'dimension_table': self.dim_table_name,
            'schema': schema,
        }
        entity_type.trace_append(created_by=self,
                                 msg='Wrote dimension to table',
                                 log_method=logger.debug,
                                 **kwargs)

        return True

    @classmethod
    def build_ui(cls):
        '''
        Registration metadata
        '''
        # Define arguments that behave as function inputs.
        inputs = []
        # Define arguments that behave as function outputs.
        outputs = []
        outputs.append(ui.UIStatusFlag(name='output_item'))
        return (inputs, outputs)

When you test the custom function in your local environment, you can use a script similar to the following sample to test the SampleDimensionPreload_random class. If you plan to use preset values, replace the class name SampleDimensionPreload_random with SampleDimensionPreload_preset class.

import json
import logging
from ai.function_dimension import (SampleDimensionPreload_random,
                                   SampleDimensionPreload_preset)
from iotfunctions.db import Database
from iotfunctions.enginelog import EngineLogging

EngineLogging.configure_console_logging(logging.DEBUG)

'''
# 1. Getting DB credentials.
# Go to Services > Watson IOT Platform Analytics > Copy to clipboard.
# Paste the contents in a credentials_as.json file.
# Save the file in scripts.
# Take care not to push the credentials file to your external repository.
'''

'''
1. Create a database object to access the Analytics Service DB. 
Take care not to push the credentials file to your external repository.
'''
schema = 'bluadmin' #  set if you are not using the default
with open('./scripts/credentials_as.json', encoding='utf-8') as F:
    credentials = json.loads(F.read())
db = Database(credentials = credentials)

'''
2. Register the custom function.
You must use unregister_functions if you change the method signature or required inputs.
'''
db.unregister_functions(['SampleDimensionPreload_random'])
db.register_functions([SampleDimensionPreload_random])


'''
3. Add a device type.
This example assumes that the device to which we are adding dimensions already exists.
We add the custom function to this device type to test it locally. 
Remember to update the device type name to the name of your device type.
'''
entity_name = 'entity_dimension_test_random'
entity_type = db.get_entity_type(name=entity_name)

# Get the dimension table name and add dimension values to the table.
try:
    dim_table_name = (entity_type.get_attributes_dict()['_dimension_table_name']).lower()
except:
    dim_table_name = entity_name + '_dimension'

entity_type._functions.extend([SampleDimensionPreload_random()])

'''
Test the execution of KPI calculations defined for the device type locally.
A local test will not update the server job log or write KPI data to the Analytics Service data
lake. Instead, the KPI data is written to the local file system in csv format.
'''
entity_type.exec_local_pipeline(**{'_production_mode': False})


'''
View device data.
'''
print ("Reading new dimension table")
print(dim_table_name)
df = db.read_dimension(dimension=dim_table_name, schema=schema)
print(df.head())

print("Finished reading the device dimension table")

Simulating dimension data

You can simulate dimensions for a device type. You can use a Python script similar to the following script to add dimensions and their values. Random values are set from an array of values. The script includes these steps:

  1. Connect to the Analytics Service.
  2. Create a database object in the Analytics Service database.
  3. Add new database columns for the dimensions and add random values.

Sample script


import json
import logging
from iotfunctions.db import Database
from iotfunctions.enginelog import EngineLogging

from sqlalchemy import Column, String, Integer, Float

EngineLogging.configure_console_logging(logging.DEBUG)

'''
This script adds dimensions to a device type and simulates the values by adding random values.
'''

'''
# 1. Getting DB credentials.
# Go to Services > Watson IOT Platform Analytics > Copy to clipboard.
# Paste the contents in a credentials_as.json file.
# Save the file in scripts.
# Take care not to push the credentials file to your external repository.
'''
schema = 'bluadmin' #  set if you are not using the default
with open('./scripts/credentials_as.json', encoding='utf-8') as F:
    credentials = json.loads(F.read())
db = Database(credentials = credentials)



'''
2. Create a database object to access the Analytics Service database.
'''
entity_name = 'entity_dimension_simulate'
entity_type = db.get_entity_type(name=entity_name)

#  Get the dimension table name and add dimension values to the table.
try:
    dim_table_name = (entity_type.get_attributes_dict()['_dimension_table_name']).lower()
except:
    dim_table_name = entity_name + '_dimension'

# db.drop_table(dim_table_name, schema=schema)

'''
3. Create new dimensions by specifying them as columns
Dimension can be of the following types
Integer, INTEGER, Float, FLOAT, String, VARCHAR, DateTime, Timestamp
3.1 Using dimension with predefined values:
When you use any of these dimensions, values are selected from a predefined set.
In the following arrays, the dimension name on the left-hand side generates dimension values from the arrawy on the right-hand side.
Dimension name:Values
['company', 'company_id', 'company_code']: ['ABC', 'ACME', 'JDI']
['plant', 'plant_id', 'plant_code']: ['Zhongshun', 'Holtsburg', 'Midway']
['plant', 'plant_id', 'plant_code']: ['US', 'CA', 'UK', 'DE']
['firmware', 'firmware_version']: ['1.0', '1.12', '1.13', '2.1']
['manufacturer']: ['Rentech', 'GHI Industries']
['zone']: ['27A', '27B', '27C']
['status', 'status_code']: ['inactive', 'active']
['operator', 'operator_id', 'person', 'employee']: ['Fred', 'Joe', 'Mary', 'Steve', 'Henry', 'Jane', 'Hillary', 'Justin', 'Rod']
3.2 Using other dimension names:
Any other dimension name generates random values.
'''
db.drop_table(dim_table_name, schema=schema)
entity_type.make_dimension(dim_table_name,
                           Column('company', String(50)),
                           Column('status', String(50)),
                           Column('operator', String(50)),
                           **{'schema': schema})

entity_type.register()

'''
To test the execution of KPI calculations locally use the following function.
A local test will not update the server job log or write KPI data to the Analytics Service data
lake. Instead, the KPI data is written to the local file system in csv format.
'''
# entity_type.exec_local_pipeline(**{'_production_mode': False})

ef = db.read_table(entity_type.logical_name, schema=schema, columns=[entity_type._entity_id])
ids = set(ef[entity_type._entity_id].unique())
entity_type.generate_dimension_data(entities=ids)