Preparing the data for the ETL module

Explains how you prepare the raw data for the ETL module on the Jupyter node as a .csv file.

The raw data is created as a .csv files, the raw data folder structure is shown as the following:
  • connectivity
  • electric_station
  • exclude_time
  • feeder_root
  • meter
  • meter_load
  • meter_voltage
  • overhead_cable
  • scada
  • scada_load
  • substation_region
  • transformer
  • underground_cable

The prepared data includes master data and reading data. The master data contains the assets in the power grid, and the reading data contains the measurement readings, for example: voltage and load.

As the ETL module supports data in .csv format, by using the csv format, you can edit and update the master data and reading data. You can also incrementally add reading data to the ana store. You can define multiple csv files in each folder. The ETL module does not support sub-folders.

The format of the .csv files is given as follows:

Master data

The master data contains the data of the assets.
Table 1. Connectivity master data
Column name Type Description Constraints
assetId String asset ID Primary Key, Unique, Not NULL.
substation String ID of substation region to where electric station belongs. Foreign Key, must be a valid substation in the substation region table.
type String The asset type.  
phase String The phase of the asset.  
numberOfPhases String The amount of phases.  
normalStatus String The normal status.  
node1 String The node 1.  
node2 String The node 2.  
Table 2. Electric station master data
Column name Type Description Constraints
id String ID of the electric station Primary Key, Unique, Not NULL.
substation String ID of substation region to where electric station belongs. Foreign Key, must be a valid substation in the substation region table.
geometry String The geometry of the electric station. Must be in a valid WKT POLYGON, in WSG84 projection.
Table 3. Exclude time master data
Column name Type Description Constraints
feeder String ID of the feeder. Primary Key. Must be a valid feeder in the feeder table or feeder_group in the feeder_group table
type String Type of analysis. Primary Key. The candidate value include Load, Voltage, Voltage_with_scada.
startTime String Timestamp, in the format of yyyy-MM-ddThh:mm:ss.sss. Primary Key.

The time range to be excluded from the analysis with the startTime being the start time of the range (inclusive).

endTime String Timestamp, in the format of yyyy-MM-ddThh:mm:ss.sss. Primary Key. The time range to be excluded from the analysis with the endTime being the end time of the range (exclusive).
Table 4. Feeder root master data
Column name Type Description Constraints
feeder String ID of the feeder where the meter is connected Foreign Key, must be a valid feeder in the feeder table.
rootAsset String The root asset.  
Table 5. Meter master data
Column name Type Description Constraints
id String ID of the meter Primary Key, Unique, Not NULL.
substation String ID of substation region where the meter belongs to. Foreign Key, must be a valid substation in the substation region table.
feeder String ID of the feeder where the meter is connected Foreign Key, must be a valid feeder in the feeder table.
transformer String ID of the transformer Foreign Key, must be valid transformer in transformer table.
phase String The phase of meter

Must be a valid phase code from 1 to 7.

For example: 0b100=4=A, 0b010=2=B, 0b001=1=C, 0b110=6=AB.
geometry String Geospatial geometry in WKT format, should be a point. Must be a valid WKT POINT, in WSG84 projection.
Table 6. Overhead cable master data
Column name Type Description Constraints
id String ID of the overhead cable Primary Key, Unique, Not NULL.
substation String ID of the substation region where the overhead cable belongs to. Foreign Key, must be a valid substation in the substation region table.
feeder String ID of the feeder where the overhead cable is connected. Foreign Key, must be a valid feeder in the feeder table.
phase String The phase of the overhead cable.

Must be a valid phase code from 1 to 7.

For example: 0b100=4=A, 0b010=2=B, 0b001=1=C, 0b110=6=AB.
geometry String The geometry of the overhead cable. Must be a valid WKT MULTILINESTRING, in WSG84 projection.
Table 7. SCADA master data
Column name Type Description Constraints
assetId String ID of the feeder. Primary Key, Unique, Not NULL.
measurement String Type of analysis

Primary Key

Candidate value must include Load, Voltage, Voltage_with_scada
scadaId String ID of SCADA.  
Table 8. Substation region master data
Column name Type Description Constraints
id String ID of the substation region. Primary Key, Unique, Not Null.
geometry String Geospatial geometry in WKT format, should be a polygon. Must be valid WKT POLYGON. In WSG84 projection
Table 9. Transformer master data
Column name Type Description Constraints
id String ID of the transformer. Primary Key, Unique, Not NULL.
substation String ID of the substation region where the transformer belongs to. Foreign Key, must be a valid substation in the substation region table.
feeder String ID of the feeder where the transformer is connected. Foreign Key, must be a valid feeder in the feeder table.
phase String The phase of the transformer

Must be a valid phase code from 1 to 7.

For example: 0b100=4=A, 0b010=2=B, 0b001=1=C, 0b110=6=AB.
kva Int The kilovolt-amps.  
voltage Int The voltage.  
geometry String Geospatial geometry in WKT format, should be a point. Must be a valid WKT POINT, in WSG84 projection.
Table 10. Underground cable master data
Column name Type Description Constraints
id String ID of the underground cable. Primary Key, Unique, Not NULL.
substation String ID of the substation region where the underground cable belongs to, Foreign Key, must be a valid substation in the substation region table.
feeder String ID of the feeder where the underground cable is connected. Foreign Key, must be a valid feeder in the feeder table.
phase String The phase number.

Must be a valid phase code from 1 to 7.

For example: 0b100=4=A, 0b010=2=B, 0b001=1=C, 0b110=6=AB.
geometry String The geometry of the underground cable. Must be a valid WKT MULTILINESTRING, in WSG84 projection.

Reading data

The reading data contains the current record values, for example, the voltage values and the load values. The reading data is used for the analysis.
Note: These notes are for all reading data tables:
  1. The timestamp must be in same time zone as the corresponding load table. For example both load tables and reading tables must either use UTC or use the local time zone.
  2. The time interval between two timestamps must be fixed and be the same as time interval in corresponding load table. For example, if the time interval is 1 hour, then all readings in the meter load table and feeder load table should use 1 hour interval.
  3. The timestamp should align with timestamp in the corresponding load table. For example, if the feeder load has a timestamp 9:00 then the meter load timestamp should be the same.
  4. The timestamp represents the end edge. For example, if the timestamp is 10:00 and interval used is 1 hour, the kWh is the load between 9:00-10:00.
Table 11. Feeder load reading data
Column name Type Description Constraint
scadaId String ID of the feeder Derived
timestamp String Timestamp in the format yyyy-MM-ddThh:mm:ss.sss . Expected in UTC
kwh1 Double Load of phase A.

The value represents the aggregated active load in the past time interval.

All values must be in units of kWh not MWh.

The gap between feeder load and aggregated meter load should less than 10% otherwise the accuracy is affected.

The kwh1 is load of phase A that maps to phase code 0b100=4.

kwh2 Double Load of phase B.  
kwh3 Double Load of phase C.  
Table 12. Meter load reading data
Column name Type Description Constraint
meterId String ID of the meter

Primary Key, Foreign Key

Must be a valid meter in the meter table
timestamp String Timestamp of reading data, in the format of yyyy-MM-ddThh:mm:ss.sss  
kwh Double Load

Value represent aggregated active load of all 3 phases in past interval.

All values in unit to KWH.
Table 13. Feeder voltage reading data
Column name Type Description Constraint
scadaId String ID of the SCADA tag.

Derived

timestamp String Timestamp in the format yyyy-MM-ddThh:mm:ss.sss. Expected in UTC
volt1 Double Phase A voltage

The value represents the average voltage in the past time interval.

The value must be normalized to the same voltage level. For example, if some meters are 120v, and others are 240v, when the feeder voltage is 1kV, the values is normalized to the minimal voltage level, that is: Value * 120 / 240.

When ch1volt, ch2volt, ch3volt represent the voltage of 3 phases, ch1volt maps to phase A, ch2volt maps to phase B, and ch3volt maps to phase C.

volt2 Double Phase B voltage  
volt3 Double Phase C voltage  
Table 14. Meter voltage reading data
Column name Type Description Constraint
meterId String ID of the meter

Primary Key, Foreign Key

The feeder should be a valid feeder in the feeder table, or feeder group in the feeder_group table.
timestamp String Timestamp yyyy-MM-ddThh:mm:ss.sss.  
volt1 Double Phase A voltage

The value represents the average voltage in the past interval.

The value must be normalized to the same voltage level. For example, if some meters are 120v, and others are 240v, when the feeder voltage is 1kV, the values is normalized to the minimal voltage level, that is: Value * 120 / 240.

Channels ch1, ch2 and ch3 need not be the same value. For a single phase meter, only one channel should have a value. For two phases meters, only two channels should have values. For 3 phases meters, all channels must have values.

volt2 Double Phase B voltage  
volt3 Double Phase C voltage