Overview of the data flow

The data flow contains four parts: the data preparation for the Connectivity Model application, the Extract, Transform, and Load (ETL) process, the data validation, and the analysis of the data.

The details of the ETL module are described in this section.

You prepare you own data in .csv format and then load the .csv files into the HDFS.

After preparing the raw data, you run the ETL and validation module to generate the data for the analysis.