Lesson 1.3: Getting the sample data

The Db2® Big SQL tutorials use sample data from the Db2 Big SQL service. You must access this data to complete all of the lessons.

About this task

This Db2 Big SQL tutorial uses sample data that is provided in the $BIGSQL_HOME/samples directory on the Linux® file system of the Db2 Big SQL server. By default, the $BIGSQL_HOME environment variable is set to the installed location, which is /usr/ibmpacks/current/bigsql/bigsql.

Procedure

  1. From the Linux command line, log in as the bigsql user, and provide the password if prompted:
    su - bigsql
  2. You will use the CREATE, LOAD, and DROP scripts that are installed with the Db2 Big SQL server. Ensure that you can access these files:
    ls $BIGSQL_HOME/samples/queries
    The following three files contain the SQL statements that you need:
    
    GOSALESDW_ddl.sql
    GOSALESDW_load.sql
    GOSALESDW_drop.sql
  3. Upload the following three files from the queries directory, which contains the SQL scripts, to the distributed file system: GOSALESDW_ddl.sql, GOSALESDW_load.sql, and GOSALESDW_drop.sql.
    1. First, use your favorite cross-platform FTP application to download the files that you want to modify to your local machine.
    2. Edit GOSALESDW_load.sql to specify the correct source location of the data files from which you will load the tables.
      Replace every instance of url 'file:///opt/ibm/biginsights/bigsql/samples/data/ with url '/user/bigsql/bi_sample_data/data/.
    3. (Optional) For this tutorial, you will need only 10 out of a possible 68 tables and the data that goes with them. You can modify the three scripts to comment out all but these 10 tables to save on storage.
      The 10 required tables are:
      • GOSALESDW.DIST_INVENTORY_FACT
      • GOSALESDW.EMP_EMPLOYEE_DIM
      • GOSALESDW.GO_BRANCH_DIM
      • GOSALESDW.GO_REGION_DIM
      • GOSALESDW.SLS_ORDER_METHOD_DIM
      • GOSALESDW.SLS_PRODUCT_BRAND_LOOKUP
      • GOSALESDW.SLS_PRODUCT_DIM
      • GOSALESDW.SLS_PRODUCT_LINE_LOOKUP
      • GOSALESDW.SLS_PRODUCT_LOOKUP
      • GOSALESDW.SLS_SALES_FACT
    4. Upload the three modified scripts back to the $BIGSQL_HOME/samples/queries directory on the Linux file system of the Db2 Big SQL server.
    5. Upload the three modified scripts to HDFS.
      The target directory on HDFS is queries, under bi_sample_data, which you created previously. To upload the three modified scripts to HDFS, create the queries directory and then run the following put commands. You can copy and paste all four lines of code as a group.
      
      hdfs dfs -mkdir /user/bigsql/bi_sample_data/queries;
      hdfs dfs -put $BIGSQL_HOME/samples/queries/GOSALESDW_ddl.sql /user/bigsql/bi_sample_data/queries; 
      hdfs dfs -put $BIGSQL_HOME/samples/queries/GOSALESDW_load.sql /user/bigsql/bi_sample_data/queries; 
      hdfs dfs -put $BIGSQL_HOME/samples/queries/GOSALESDW_drop.sql /user/bigsql/bi_sample_data/queries
  4. Now that you have the statements to create, load, and drop tables in the distributed file system, get the data that you will use to load the tables.
    This data is also installed with the Db2 Big SQL server. However, the LOAD HADOOP statements that you use to populate the tables with data run more efficiently if the data is in the distributed file system.
    The target directory is data, under bi_sample_data, which you created previously. To upload the data for the 10 required tables, create the data directory and then run the following put commands. You can copy and paste all 11 lines of code as a group.
    
    hdfs dfs -mkdir /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.DIST_INVENTORY_FACT.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.EMP_EMPLOYEE_DIM.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.GO_BRANCH_DIM.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.GO_REGION_DIM.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_ORDER_METHOD_DIM.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_PRODUCT_BRAND_LOOKUP.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_PRODUCT_DIM.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_PRODUCT_LINE_LOOKUP.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_PRODUCT_LOOKUP.txt /user/bigsql/bi_sample_data/data;
    hdfs dfs -put $BIGSQL_HOME/samples/data/GOSALESDW.SLS_SALES_FACT.txt /user/bigsql/bi_sample_data/data
    Alternatively, if you want to upload the data for all 68 tables, run the following command:
    hdfs dfs -put $BIGSQL_HOME/samples/data /user/bigsql/bi_sample_data