Data integration and migration project

To move data between sources, you must know how your data is structured so that you can determine which information you want to move and how much space will be needed on the new system. You can create a data profiling project to help you understand the structure of your data and identify anomalies.

About this task

The data profiling process consists of five analyses that work together to evaluate your data. After column analysis completes, a frequency distribution for each column is generated and then used as the input for the subsequent analyses. During analysis, inferences are made about your data. The inferences often represent the best choices that you can make to improve the quality of your data. When an analysis completes, you can review the inferences and accept or reject them.

When you profile your data for an integration or migration project, there are certain analysis results that might contain the most relevant information about your data. For example, after column analysis, you can view inferences about data classes in a frequency distribution. Data must be classified correctly so that it can be evaluated consistently across multiple sources. You can also review the frequency distribution for more information about anomalies and redundant data. Incorrect and redundant data take up space on a system and might slow down the processes that are associated with the data.

If you are transferring non-relational data to a relational data source, you need to know whether there are any primary keys in the data. Primary keys are unique columns that identify rows in tables and create relationships between tables. After you run a key and cross-domain analysis and review primary key candidates, you can accept or reject primary key candidates or find defined keys that already exist in the data. You might also want to assess whether there are any foreign keys in your data. Foreign keys are columns that link to columns that contain the same values in other tables. You can run a key and cross-domain analysis to identify any foreign key candidates and cross-table relationships.