Loading historical data from a CSV file

Edit online

Use the CSV tooling available in the ea-events-tooling container as an alternative method to import real historical data. You might need to do this if there are firewall restrictions around your database servers, or if your database technology is not supported.

Before you begin

Complete the following prerequisite items:

The ea-events-tooling container is installed by the operator. It is not started as a pod, and contains scripts to install data on the system, which can be run with the kubectl run command.
Find the values of image and image_tag for the ea-events-tooling container, from the output of the following command:
```
kubectl get noi <release_name> -o yaml | grep ea-events-tooling
```
Where <release_name> is the custom resource release name of your cloud deployment. For example, in the following output, image is ea-events-tooling, and image_tag is 2.0.14-20200120143838GMT.
```
kubectl get noi <release_name> -o yaml | grep ea-events-tooling
    --env=CONTAINER_IMAGE=image-registry.openshift-image-registry.svc:5000/default/ea-events-tooling:2.0.14-20200120143838GMT \
    --image=image-registry.openshift-image-registry.svc:5000/default/ea-events-tooling:2.0.14-20200120143838GMT \
```
Hybrid deployment: For a hybrid deployment, run the following command:
```
kubectl get noihybrid <release_name> -o yaml | grep ea-events-tooling
```
Where <release_name> is the custom resource release name of your hybrid deployment.
IBM® Netcool® for AIOps deployment: For an online or offline deployment (airgap) of Netcool Operations Insight® with IBM Cloud Pak for AIOps, find the values of image and image_tag from the noi-operator CSV file. Run the following command:
```
oc get csv <noi-operator> -o yaml | grep olm.relatedImage.NOI_ea-events-tooling: | awk -F ': ' '{print $2}'
```
Where <noi-operator> is the noi-operator CSV file name.

About this task

The reasons why you might be unable to load historical event data into the system using the standard methods include the following:

Firewall blocks access to the database containing the historical event data.
The driver for the database containing the historical event data is not supported. This is the case for Oracle.

You use scripts in the ea-events-tooling container to convert historical data in CSV format to data in JSON format, which can then be ingested into the system.

Procedure

Create a CSV file dump of the historical event data.
The minimum amount of historical event data required for initial training by Cloud native analytics is three months.

Note: The CSV file dump must be in zipped format and have the filetype .csv.gz; for example: CSV-dump-filename.csv.gz.
On the server running in your Netcool Operations Insight cluster, open a Bash shell.
Create a pod based on the ea-events-tooling container. Give the pod a name that is easy to remember. This will enable you to access scripts within that container. Use the following command.
1. Retrieve the name of the tooling image name, by running the following command:
```
oc get noi noi_deployment_name -o yaml | grep ea-events-tooling
```
  Where noi_deployment_name is the name of the Netcool Operations Insight deployment. You can retrieve this name by running the command oc get noi.
2. Create the pod using the following command:
```
oc run pod-name -it --restart=Never --image=image_name --env=LICENSE=accept --command=true bash
```
  Where:
  - pod-name is the name you choose for the pod, for example sky-ea-events-tooling.
  - image_name is the name of the tooling image, which you retrieved in step a, with the source registry specified, for example: cp.icr.io/cp/noi/ea-events-tooling:12.0.0-20220316140229GMT
Run the following command to check that the pod started correctly.
```
oc get pod pod-name
```
Where pod-name is the name you gave to the pod.
In the pod create the following new directories:
- /app/files
- /app/output
- /app/converttest
- /app/sybasetest
Copy the CSV dump file from step 1 into the pod using a command similar to the following:
```
oc cp CSV-dump-filename.csv.gz pod-name:/app/files
```
Where:
- pod-name is the name you gave to the pod.
- CSV-dump-filename is the name of the CSV dump file from step 1.
Log in to the pod you created using the following command:
```
oc exec -ti pod-name bash
```
Where pod-name is the name you gave to the pod.
Set the following environment variable based on the timestamp(s) used in the data within the CSV file.
If one timestamp only is used, then run a command similar to the following, substituting the timestamp in use in your data for the sample timestamp:
```
export METADATADISCOVERY_DATETIME_CONVERSION_FORMATSET='[yyyy-MM-dd HH:mm:ss]'
```
If multiple timestamps are used, then run a command similar to the following, substituting the timestamps in use in your data for the sample timestamps:
```
export METADATADISCOVERY_DATETIME_CONVERSION_FORMATSET='[yyyy-MM-dd HH:mm:ss, yyyy-MM-dd hh:mm:ss a]'
```
Run the checkfiletofile.sh script to convert the CSV data to parsed JSON format.
```
/app/bin/checkfiletofile.sh files/CSV-dump-filename.csv.gz output -csv
```
Where CSV-dump-filename is the name of the CSV dump file from step 1.

This script generates JSON output in a zip archive named CSV-dump-filename.json.gz to the /app/output directory that you created in step 5. This directory also contains other files, which you can ignore.
Run the convertfiletofile.sh command from the /app directory in the pod to convert timestamps to epoch format:
```
bin/convertfiletofile.sh output/CSV-dump-filename.json.gz converttest/
```
This script generates JSON output in a zip archive named CSV-dump-filename.json.gz to the /app/converttest directory created in step 5.

Optional: To create a file to resolve the correct column name information, run the following script that connects with an Objectserver:

export JDBC_SYBASE_HOST=$(env | grep OBJSERV_AGG_PRIMARY_SERVICE_HOST | sed s/.*=//)

bin/convertfiletosybasefile.sh converttest/CSV-dump-filename.json.gz sybasetest/

What to do next

You can now use the JSON output file CSV-dump-filename.json.gz to run your data through the system as described in Training with local data.