Automating the data flow

To simplify the use of the Connectivity Model application, three flow scripts are delivered with IBM® IoT for Energy and Utilities

Before you begin

Before executing the flow scripts, the configuration items listed in the Configuring the Connectivity Model application must be complete.

The incoming master and reading data must be prepared and be in the /home/<utility>/staging directory. The files must be in a .zip format and the names must start with master_data_*.zip and reading_data_*.zip respectively. For example:

master_data_2017-12-19.zip

and

reading_data_2017-12-19.zip

Procedure

Log into the Jupyter node as a tenant user with access rights to HDFS and Hbase and start the master data flow.
1. Open the /home/<utility>/automation directory.
2. Run the script: ./master_data_flow.sh The output example for the zip file master_data_2017-12-19.zip:
  Figure 1. The example output
  Where master_data_2017-12-19.log is the log directory, master_data_2017-12-19.report contains the quality report, and master_data_2017-12-19.success indicates that the flow was completed without errors.
From the Jupyter node, start the reading data flow.
1. Open the /home/<utility>/automation directory.
2. Run the script: ./cm_automation/reading_data_flow.sh.
  Note: Export the ANALYSIS_LOAD_UNTIL_TIME and ANALYSIS_VOLTAGE_UNTIL_TIME environment variables with a suitable time for the reading data before running the script if the latest time in the reading data is not yesterday.
  The output example for the zip file reading_data_2017-12-19.zip:
  Figure 2. The reading data output
  Where reading_data_2017-12-19.log is the log directory, reading_data_2017-12-19.report contains the quality report, and reading_data_2017-12-19.success indicates that the flow was completed without errors.
From the Jupyter node, start the analysis flow.
1. Open the /home/<utility>/automation directory.
2. Run the script:
```
./cm_automation/analysis_flow.sh
```
  Note: Export the ANALYSIS_LOAD_UNTIL_TIME and ANALYSIS_VOLTAGE_UNTIL_TIME environment variables with a suitable time if necessary.
  The output example for the zip file analysis_flow_2018-01-08.zip:
  Figure 3. The analysis flow output
  Where analysis_flow_2018-01-08.log is the log directory, analysis_flow_2018-01-08.report contains the quality report, and analysis_flow_2018-01-08.success indicates that the flow was completed without errors.

Schedule the flows with crontab script. The three flows can be run separately, or scheduled with crontab for a specified time.

Log in as a tenant user.
Run the command:
```
crontab -e
```

Put the contents into a text editor.

Note: For the format of the crontab file, please refer to the Linux crontab guide.

An example crontab file:

#specify time zone to be used. Right now it is UTC timezone
CRON_TZ=UTC
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
# specify the necessary environment variables
#PYTHON_LIB=
#SPARK_HOME=

#uncomment the following env variables to adjust the time used in the flow if needed 
#ANALYSIS_LOAD_UNTIL_TIME=2016-09-01
#ANALYSIS_VOLTAGE_UNTIL_TIME=2016-08-31
#LOAD_UNTIL_TIME=2016-09-01
#VOLTAGE_UNTIL_TIME=2016-08-31

#master_data_flow & reading_data_flow
#master data flow scheduled at 14:00 every day, UTC timezone
00 14 * * * $HOME/automation/master_data_flow.sh &
#reading data flow scheduled at 15:00 every day, UTC timezone
00 15 * * * $HOME/automation/reading_data_flow.sh &

#analysis_flow scheduled at 18:00 every Saturday, UTC timezone
00 18 * * sat  $HOME/automation/analysis_flow.sh &