Installing the sample analytics pipeline

A primary goal of the sample analytics pipeline is to make adoption and setup easy. Therefore, Docker is used extensively to install and configure the various components. You might need to make modifications to the installation to suit your environment.

The sample analytics pipeline can be installed on Linux® on an x86 system or Linux on IBM Z environments.

The installation process uses Docker to download, install, and configure the various open-source software, including Grafana, MariaDB or MySQL, Apache Kafka, and various Python packages.

Before you begin

Ensure that you have the following prerequisites installed:
- Red Hat Enterprise Linux (RHEL) Version 7.4 for x86
- Docker 18.09.6 or later, which is available from https://docs.docker.com/install/linux/docker-ce/binaries
- Docker-Compose 1.24.0 or later, which is available from https://docs.docker.com/compose/install/
Note: These instructions and scripts were implemented and tested on a virtual machine that has no other Docker images or containers. If you use these instructions and scripts on a machine that has other Docker images and containers, carefully review each step, script, and other information to ensure no undesirable behaviors result.
Additionally, these instructions assume the user ID that is used to implement the installation has the authorization to enter Docker commands directly. To ensure that you do not need to issue the sudo command for every Docker command that is entered manually or issued from the included scripts, use the following command to provide authorization to the user ID that will manage the installation:

usermod -aG docker userID
Ensure that you understand the sample analytics pipeline server configurations and components.
Determine your preferred server configuration (single or dual) and which servers will be Linux on an x86 system or Linux on IBM Z.
To use real-time runtime metrics collection, ensure that you have adequate disk space on the collection server or single-server configuration for the real-time runtime metrics collection logging. For more information, see Runtime metrics collection log files.

Procedure

If you want to set up a dual-server configuration, complete the following steps on both the collection server and the analytics server, unless the step explicitly indicates to do the step on only one of the servers. If you want to set up a single-server configuration, complete the following steps on a single server.

Copy the base/tpfrtmc/bin/tpf_sample_analytics_pipeline.tar.gz file in binary format from your z/TPF source repository to the home directory on your Linux machine. Enter the following command to extract the content from the tar file:
tar -xf tpf_sample_analytics_pipeline.tar.gz
Default credentials are specified in the tpf_data_sci/tpf_default_credentials.text file. These credentials are used in various scripts that are provided with the sample analytics pipeline. Change the user name and password to values that are more secure for your environment. When you change passwords, you must make updates to various files in the tpf_data_sci/Docker directory.

There are many components in use in the sample analytics pipeline. The component versions that are indicated were stable at the time of release. To use the latest versions of these components, modify the version numbers that are specified in the tpf_data_sci/user_files/tpf_prepare_configurations.yml and tpf_data_sci/user_files/tpf_zrtmc_analyzer_files/requirements.txt files with the latest version.
For the collection server or a single-server configuration, copy the base/tpfrtmc/bin/tpfrtmc.tar.gz file in binary format from your z/TPF source repository to the tpf_data_sci/Docker/tpf_rtmc_docker_files/ directory. Enter the following command to extract the content from the tar file:
tar -xf tpfrtmc.tar.gz
For the analytics server or a single-server configuration, copy the base/tpfrtmc/bin/tpf_zmatc_analyzer.tar.gz file in binary format from your z/TPF source repository to the tpf_data_sci/Docker/tpf_zmatc_analyzer_docker_files/ directory. Enter the following command to extract the content from the tar file:
tar -xf tpf_zmatc_analyzer.tar.gz
Define your Apache Kafka hosts, encryption settings, topic settings, and programmatic variables in the tpf_data_sci/user_files/kafka_hosts.yml file. For more information about how to configure this file, see the comments in the file.
If Python 3.8 and the pyyaml library, which are used by the tpf_prepare_configurations.sh script, are not installed on your system, enter the following commands to install them:
1. sudo yum install python38
2. sudo python3 -m pip install --upgrade pyyaml
Change your directory to the Docker directory. Enter the following command:
cd tpf_data_sci/Docker
For the collection server, analytics server, or single-server configurations, you must prepare your configuration.
This configuration determines if the server will use MariaDB or MySQL, Linux on an x86 system or Linux on IBM Z, use trusted dependency repositories, and more.
1. Define your settings in the tpf_data_sci/user_files/tpf_prepare_configurations.yml file.
2. Enter the following command to configure your server:
  ./tpf_prepare_configurations.sh
  
  To view which files are edited and what changes are made to achieve your desired settings, see the tpf_prepare_configurations.sh script.
Enter the docker-compose command to start the Docker containers.
- For the collection server or single-server configurations, take one of the following actions:
  - If you are using a MySQL database, enter the following command:
    docker-compose --file tpf-insights-dashboard-network.yml --file tpf_mysql.yml --file tpf_kafka.yml up -d --build
  - If you are using a MariaDB database, enter the following command:
    docker-compose --file tpf-insights-dashboard-network.yml --file tpf_mariadb.yml --file tpf_kafka.yml up -d --build
    Note: For Kafka configurations on Linux on IBM Z, if you need to rebuild the Kafka container, first remove all files and folders in the tpf_data_sci/Docker/tpf_kafka_docker_files/volumes/kafka-logs directory by issuing the following command:
    rm -rf tpf_data_sci/Docker/tpf_kafka_docker_files/volumes/kafka-logs/\*
    Otherwise, you might receive the following error from the Kafka broker when the tpf-kafka-broker container starts:
    The Cluster ID jw3FiOddStufuL211VzUjQ doesn't match stored clusterId.
- For the analytics server, take one of the following actions:
  - If you are using a MySQL database, enter the following command:
    docker-compose --file tpf-insights-dashboard-network.yml --file tpf_mysql.yml up -d --build
  - If you are using a MariaDB database, enter the following command:
    docker-compose --file tpf-insights-dashboard-network.yml --file tpf_mariadb.yml up-d --build
For more information about managing containers and images, see the Docker documentation.
Set up the database tables and stored procedures by running the SQL script. For the collection server, analytics server, and single-server configurations, enter the following command:
./tpf_setup_db.sh
Run the following script for the collection server or single-server configurations:
./tpf_create_kafka_topics.sh

This script creates the Apache Kafka topics.
Run the following script for the collection server or single-server configurations:
./tpf_modify_kafka_topics.sh hostname:port

where hostname:port is the hostname and port that is specified in the tpf_data_sci/user_files/kafka_hosts.yml file from step 5.

This script modifies the Apache Kafka topics based on the modify_script_variables settings that are specified for your host in the tpf_data_sci/user_files/kafka_hosts.yml file.
Enter the docker-compose command to start the tpfrtmc Docker containers. For the collection server or single-server configurations, enter the following command:
docker-compose --file tpf-insights-dashboard-network.yml --file tpf_collection_server.yml up -d --build
Optional: Configure the ZRTMC analyzer instances to support multiple z/TPF systems.
Enter the docker-compose command to start the remaining Docker containers. Enter the following command for the analytics server or single-server configurations:
docker-compose --file tpf-insights-dashboard-network.yml --file tpf_analytics_server.yml up -d --build

Note: The ZRTMC analyzer connects to both Apache Kafka and the database upon startup. Any data that is available on the configured Apache Kafka topics will start being processed. The ZMATC analyzer performs analysis on all available message analysis tool results in the database on the analytics server.
Optional: If you have an active firewall, ensure that the ports specified in the YAML (.yml) files are open. For example:
1. Enter the following command for each port that is exposed by the YAML files:
  sudo firewall-cmd --zone=public --add-port=portID/tcp --permanent
  where portID represents the following ports:
  - For MariaDB or MySQL: 3306
  - For Grafana: 3000
  - For Apache Kafka: 2181, 9092, 9093, 8082, 8000
  - For tpfrtmc: 9090
2. Reload the firewall by entering the following command:
  sudo firewall-cmd --reload
  
  You can modify all ports before entering the reload command. Additionally, you can use the tpf_data_sci/Docker/tpf_open_firewall_ports.sh script to process all of these commands for the default ports.
Optional: If you plan to process tapes created by the name-value pair collection process with the ZCNVP command, enter the docker-compose command to start the tpfrtmc Docker container. For the collection server or single-server configurations, enter the following command:
docker-compose --file tpf-insights-dashboard-network.yml --file tpf_zcnvp_tpfrtmc.yml up -d --build

What to do next

The analytics pipeline is now fully functional.