Exploring your database

Most data mining projects start with a data understanding phase where you explore the data that is available for your analysis. This tutorial introduces you to the data exploration functions in the Design Studio.

The database DWESAMP needs to be prepared on the database server. See Preparing the data for the tutorials

This tutorial shows how to explore your data. You can navigate the database in the Data Source Explorer.
  1. To explore the properties of the table BANKCUSTOMERS, complete these steps: expand the database DWESAMP and open and select the table BANKCUSTOMERS.
    1. In the Data Source Explorer, expand DWESAMP > Schemas > BANK > Tables and select the table BANKCUSTOMERS.

      When you select a table, you can explore the properties of this table in the Properties view.

    2. In the Properties view, click the Columns tab.

      The Columns tab shows the name and the data type of the columns in the selected table.

  2. To explore the table content, right-click the BANKCUSTOMERS table and select Data > Sample Contents.

    The Sample Contents appears in the Result page showing rows in the table.

  3. To understand the value distribution and the relationship between the columns in a table, complete the following steps:
    1. Right-click the BANKCUSTOMERS table and select Distribution and Statistics > Multivariate Distribution....

      The Multivariate Distribution Input Data Selection window is opened.

    2. Accept the default settings to use a random sample for the calculation and click OK.

      The field statistics of the table BANKCUSTOMERS is displayed.

    3. To maximize the view, double-click BANKCUSTOMERS.

      The Field statistics table shows the number of rows that contain a value, the number of rows that contain the NULL value, and statistical information like minimum, maximum, and mean value for each column.

    4. Select the check boxes next to the field names AGE, GENDER, and MARITAL_STATUS to display the value distribution of these columns.

      By displaying multiple charts, you can explore the relationships between the columns.

    5. Now you can close the BANKCUSTOMERS Field statistics view.
  4. To explore the distribution of all other (independent) variables for a specific dependent variable, complete the following steps:
    1. Right-click the BANKCUSTOMERS table and select Distribution and Statistics > Bivariate Distribution....

      The Compute Bivariate Distributions Target Field window is opened.

    2. Select MARITAL_STATUS as the dependent variable and click Finish.

      A progress information status bar is displayed while the bivariate distributions are calculated. When the calculation is completed, the Clustering Visualizer is opened. The Graphics view shows each value of the dependent variable (MARITAL_STATUS) in a separate row.

    3. To view the value distribution of customers with MARITAL_STATUS=widowed, click inside the MARITAL_STATUS="widowed" box.

      You can see that the percentage of women is higher in the "widowed" segment than in the total population of all customers.

    4. Now you can close the Clustering visualizer.


Feedback | Information roadmap