Forwarding logs to Hadoop

You can convert log data to CSV format, and then use file transfer methods to forward the data to a remote Hadoop system.

To transfer files to the remote system, you can use one of the following methods, both based on the Secure Shell (SSH) network protocol:

SFTP
To use this method, you must have OpenSSH installed on z/OS® and SSH running on the remote system. OpenSSH is available as part of IBM® Ported Tools for z/OS.
Co:Z
To use this method, you must have Dovetailed Technologies Co:Z Co-Processing Toolkit for z/OS installed, and the associated Co:Z Target System Toolkit (sometimes referred to as the agent) installed on the remote system. Co:Z requires OpenSSH.

While SFTP simply transfers files to the remote system, the Co:Z step runs commands on the remote system issues a Hadoop command on the remote system to put the CSV file into HDFS and, optionally, a Hive command to create a catalog table.

The following figure presents an overview of the process, including the HCatalog table schema and log forwarding using Co:Z or SFTP:

Figure 1. Forwarding logs to Hadoop: transferring CSV and HCatalog files
Figure that shows Transaction Analysis Workbench extracting a log to CSV format, with corresponding optional HCatalog file, for use in Hadoop.

For simplicity, the previous figure shows a batch job that creates only a single CSV file and corresponding HCatalog file. In practice, Transaction Analysis Workbench creates a CSV file and an HCatalog file for each record type that you select for forwarding.