Hive tables

Hive tables are automatically created every time you run an activity that moves data from a relational database into a Hadoop Distributed File System (HDFS) in InfoSphere® BigInsights®. One Hive table is created for each table in the source that you specify in the activity. After Hive tables are created, you can use IBM Big SQL in InfoSphere BigInsights to read the data in the tables.

Before you create or run activities in InfoSphere Data Click, you must import one existing Hive table into the metadata repository by using InfoSphere Metadata Asset Manager and specifying the JDBC connector.

After you import the Hive table, along with the sources and targets that you want to use, you can create and run activities. When you initially run an activity, a single folder is created and associated with the activity. For example, Marketing_Reports. Every time that you run the activity, a sub-folder is created inside of Marketing_Reports, and the sub-folder is given the name of the activity instance. The Hive tables that are generated during the activity run are then stored in the folder that was created for each run.

Note: If the names of the tables and columns that you specify as your source contain non-ASCII or extended ASCII characters, then a Hive table is not generated. Also, when you specify the name of a Hive table, the name you specify cannot contain non-ASCII or extended ASCII characters. Non-ASCII and extended ASCII characters are not supported in the names of Hive tables, however they are allowed in the actual content of the columns and tables.

When the Hive table is generated, all columns in the table are separated by the delimiter that was specified on the Policies pane when you created the activity. All rows in the table are delimited with the \n control character.

After you run an activity, a Hive table is created and automatically imported into the metadata repository. You can view the table on the Repository Management tab in InfoSphere Metadata Asset Manager. In the Navigation pane, expand Browse Assets, and then click Implemented Data Resources. The Hive table will be located under the host that was specified when you created the data connection to the original Hive table that you imported by using the JDBC connector. For example, in the image below, the data connection that uses the JDBC connector to connect to the Hive table specifies the host IPSVM00104. The Hive table is bitest.

Figure 1. A view of the Hive table in InfoSphere Metadata Asset Manager

The figure shows a view of the Hive table in InfoSphere Metadata Asset Manager.

You can also view the Hive table in the Information Governance Catalog. In the image below, the original source database table that you selected to move data from in the InfoSphere Data Click activity is the PROJECT asset. The Hive table that was generated when the activity was run is the project database table.

Figure 2. A view of the Hive table in Information Governance Catalog

The figure shows a view of the Hive table in Information Governance Catalog.

In Information Governance Catalog, you can add data from the Hive table to collections and view details about it.

For example, the user Jane Doe creates the activity, HRActivity to move the table, Customer (source) into the InfoSphere BigInsights folder MarketingReports (target). When she runs the activity, a Hive table is automatically created and is mapped to the /MarketingReports/HRActivity_JaneDoe_1381231526864/JaneDoe_Customer folder. The Hive table contains relational information about the actual data that is in the Customer table. You can access the Hive table JaneDoe.Customer and then read the actual data in the table by using Big SQL.