Hive tables are automatically created every time you run
an activity that moves data from a relational database into a Hadoop
Distributed File System (HDFS) in InfoSphere® BigInsights®.
One Hive table is created for each table in the source that you specify
in the activity. After Hive tables are created, you can use IBM Big
SQL in InfoSphere BigInsights to
read the data in the tables.
Before you create or run activities in InfoSphere Data Click,
you must import one existing Hive table into the metadata repository
by using InfoSphere Metadata Asset Manager and
specifying the JDBC connector.
After you import the Hive table, along with the sources and targets
that you want to use, you can create and run activities. When you
initially run an activity, a single folder is created and associated
with the activity. For example,
Marketing_Reports.
Every time that you run the activity, a sub-folder is created inside
of
Marketing_Reports, and the sub-folder is given
the name of the activity instance. The Hive tables that are generated
during the activity run are then stored in the folder that was created
for each run.
Note: If the names of the tables and columns that you
specify as your source contain non-ASCII or extended ASCII characters,
then a Hive table is not generated. Also, when you specify the name
of a Hive table, the name you specify cannot contain non-ASCII or
extended ASCII characters. Non-ASCII and extended ASCII characters
are not supported in the names of Hive tables, however they are allowed
in the actual content of the columns and tables.
When the Hive table is generated, all columns in the table are
separated by the delimiter that was specified on the Policies pane
when you created the activity. All rows in the table are delimited
with the \n control character.
After you run an activity, a Hive table is created and automatically
imported into the metadata repository. You can view the table on
the
Repository Management tab in
InfoSphere Metadata Asset Manager.
In the
Navigation pane, expand
Browse
Assets, and then click
Implemented Data Resources.
The Hive table will be located under the host that was specified when
you created the data connection to the original Hive table that you
imported by using the JDBC connector. For example, in the image below,
the data connection that uses the JDBC connector to connect to the
Hive table specifies the host
IPSVM00104. The
Hive table is
bitest.
Figure 1. A view of the Hive table in InfoSphere Metadata Asset Manager
You can also view the Hive table in the
Information Governance Catalog.
In the image below, the original source database table that you selected
to move data from in the
InfoSphere Data Click activity
is the
PROJECT asset. The Hive table that was
generated when the activity was run is the
project database
table.
Figure 2. A view of the Hive
table in Information Governance Catalog
In
Information Governance Catalog,
you can add data from the Hive table to collections and view details
about it.
For example, the user Jane Doe creates the activity, HRActivity to
move the table, Customer (source) into the InfoSphere BigInsights folder MarketingReports (target).
When she runs the activity, a Hive table is automatically created
and is mapped to the /MarketingReports/HRActivity_JaneDoe_1381231526864/JaneDoe_Customer folder.
The Hive table contains relational information about the actual data
that is in the Customer table. You can access
the Hive table JaneDoe.Customer and then read
the actual data in the table by using Big SQL.