Bring in and extract the data
In this section, you will bring in the machine data from the application stack into the repository. Refer to Part 1: Speeding up machine data analysis for more information on the applications and their logs.
Prepared batches of logs from the application stack are provided in the Download section.
Perform the following steps.
- From the Download section, download data_and_config.zip and unzip it.
- Copy data/input_batches to a machine on your
BigInsights cluster. For this tutorial, you will use the location
/opt/ibm/input_batches. You can always change it to another
preferred location. Notice the directory structure containing the
batches. Input_batches contains the following three batches
that represent the three layers of application stack.
- Batch_webaccess – Containing logs from the web access layer.
- Batch_was – Containing logs from the WebSphere application.
- Batch_oradb – Containing logs from the Oracle database layer.
- You will use the Import-Extract application chain to
perform the Import and Extraction steps in one shot. Since the
Import application uses Distributed Copy application, first ensure
that Distributed Copy application is deployed.
- From the BigInsights console, click the Applications tab and select the Manage link.
- In the edit box, type Distributed. You will notice the Distributed File Copy application is found and listed under Applications.
- If the application status is NOT_DEPLOYED, then click the
Deploy button as shown in Figure 1.
Figure 1. Deploy Distributed Copy application
- You are now ready to use the Import-Extract application chain to run the Import and Extraction applications in one shot. From the BigInsights console, click the Applications tab and select the Tree view icon.
- Select the Import-Extract application. Provide
inputs and outputs. You can choose to use ftp or sftp protocol for
file transfer. The following steps use sftp.
- Import input path – sftp://<server>/opt/ibm/input_batches.
- Import output path – /GOMDADemo/search/input_batches.
- Credentials file – When using ftp, keep the default value
When using sftp, create a file containing the contents shown in Listing 1 and save it in HDFS at /user/biadmin/credstore/public/<filename>.
Listing 1. Credentials store file
Provide the location of this file for Credentials file /user/biadmin/credstore/public/<filename>.
- Extract output Path - /GOMDADemo/output/extract_out.
- Extract configuration file path – Keep the default value /accelerators/MDA/extract_config/extract.config.
- If you have already completed Part 2: Speeding up analysis of new log types of this
series, you will have a directory batch_inbox at the location
pointed to by the output path - /GOMDADemo/output/extract_out.
Running the previous steps will incrementally add the new batches,
batch_webaccess, batch_was and
batch_oradb to the same location. Figure 2 shows the
results of successful completion of the Import Extract chain.
Figure 2. Run the Import Extract chain
If you have not completed Part 2: Speeding up analysis of new log types, you will not see the batch_inbox directories in your result. You will be adding that information in the Plug and Play new log types in searching section of this tutorial.