Using the IBM Spectrum Discover application catalog

Use the IBM Spectrum® Discover application catalog to search, download, or deploy applications (which are provided by IBM®, customers, or third parties) to use in IBM Spectrum Discover.

To use the commands in the examples throughout this document, you must use Secure Shell (SSH) to log in to the IBM Spectrum Discover. You also must have an authentication token that is generated from the command-line interface (CLI). The token expires after one hour. Run the following command to generate a token:

ssh moadmin@<your IP entered during mmconfigappliance>
# Enter in the password you set during the mmconfigappliance
export SD_USER=<sdadmin or another user with dataadmin privileges>
export SD_PASSWORD=<password for SD_USER above>
export OVA=images
gettoken

Note: In this example, gettoken is an alias under the moadmin user. Using an alias saves the token in an environment variable that is called TOKEN.

Note: The examples in the sections throughout this document use the aliases tcurl and tcurl_json under the moadmin user, which also uses the TOKEN environment variable.

Information about the endpoints

Follow the procedure to access information on endpoints:

Go to IBM Spectrum Discover Documentation.
Choose the version of IBM Spectrum Discover that you are running.
Go to Table of Contents > REST API > Application management using APIs.

Querying the available applications

Run this command to query the applications that are available on dockerhub:

tcurl https://${OVA}/api/application/appcatalog/publicregistry | jq

The output that is generated contains information that is gathered from the image itself (and from dockerhub).

Running an application as a Kubernetes pod

After you decide which application you want to run, from the query output, you can use it as a Kubernetes pod within IBM Spectrum Discover. Create a JSON-formatted file with the following information (the file that is created is named example.json):

{
  "repo_name": "ibmcom/spectrum-discover-example-application",
  "version": "1.2.3",
  "description": "Unique description about your use of this application",
  "application_name": "example",
  "my_env_var": "my_value",
  "LOG_LEVEL": "DEBUG"
}

Note: The attributes in the example can be explained as shown:

The repo_name is the same repo_name that you used to download the application image.
The version is the same as the version from the output of the publicregistry command.
The description is a unique description that is based on your application use.
The application_name is the name that gets registered within the policyengine. The system automatically appends a -application to the end of the file name for identification.

Run the following command to start the application as a Kubernetes pod:

tcurl_json https://localhost/api/application/appcatalog/helm -d@example.json -X POST | jq

You can add environment variables to the JSON example. These environment variables can be ones that your application needs or they can be ones that can override some software development kit (SDK) values. The application SDK supports the following environment variables that can override default settings:

LOG_LEVEL - INFO (default), DEBUG: Specifies the log level for the application to run with.
MAX_POLL_INTERVAL - 86400000 (in milliseconds)(default - 1 day): Specifies when the Kafka consumer becomes unresponsive. Set this value higher than the time it takes for the application to process up to 100 records before it sends the reply to IBM Spectrum Discover. The default allows approximately 15 minutes for each record.
PRESERVE_STAT_TIME - False (default), True: Specifies whether to preserve atime or mtime when you run the deep-inspection application. If the application processes records from Network File System (NFS), Server Message Block (SMB), or local IBM Storage Scale connections, the system preserves the exact atime or mtime (in nanoseconds).
If the application processes records from a remote IBM Storage Scale connection, the system preserves atime or mtime up to and including seconds (with no subsecond preservation). The connection user must also have write access to the files. If the connection user does not have write access to the files, the system skips restoration of the atime or mtime because of permission errors. If DEBUG is on, you can see the original atime or mtime in the logs, so you can potentially manually restore any that fail.

Running an application as a python file

To run an application as a python file, you need to download the source code from IBM Spectrum Discover Application catalog repository available in the IBM public github repository. You can also build your own application or code and run as a python file. For more information about building your own application, see Building your application section in the ExampleApplication repository in the IBM github repository.

After you download or build your application source code, perform the following steps to run an application as a python file:

Issue the following command to install the required python packages:
```
sudo python3 -m pip install -r requirements.txt
```
Issue the following command to install the required OS packages. If you have any NFS connections, you must install the required packages.
```
sudo yum install nfs-utils
```
Note: For each connection you have in the administration page in the UI, the IBM Spectrum Discover application SDK creates the following:
- A sftp connection for each IBM Storage Scale connection
- A local NFS mount for each NFS connection
- A boto3 client for each COS connection

Define environment variables as shown.

export SPECTRUM_DISCOVER_HOST=https://<spectrum_discover_host>
# The IP or Fully Qualified Domain Name of your IBM Spectrum Discover instance.
# Default: https://localhost

export APPLICATION_NAME=<application_name>
# A short but descriptive name of your application.
# EX: exif-header-extractor-application or cos-x-amz-meta-extractor-application
# Default: sd_sample_application

export APPLICATION_USER=<application_user>
# A dataadmin user. Ex: sdadmin

export APPLICATION_USER_PASSWORD=<application_user_password>
# The password of the above dataadmin user.

export KAFKA_DIR=<directory_to_save_certificates>
# Directory where the kafka certs will be saved.
# Default: ./kafka

export LOG_LEVEL=<ERROR WARNING INFO DEBUG>
# Default: INFO

Start the sample application, by issuing the following command:
```
sudo -E python3 ./ExampleApplication.py
```

Scaling an application

An application by design processes each of the records one at a time. You can scale the number of replicas the pod is running to process records in parallel. You can scale up to 10 replicas based on the number of partitions available for the Kafka topics. Create a JSON-formatted file with the following information (the file that is created is named replicas.json):

{
  "replicas": 10
}

Then, run the following command to scale the replicas:

tcurl_json https://localhost/api/application/appcatalog/helm/interesting-anaconda-example-application -d@replicas.json -X PATCH

Note: In this example, interesting-anaconda-example-application is the combination of deployment_name and chart_name from the

Running an
application

section.

Stopping an application

Run the following command to stop an application (no matter how many replicas you scale):

tcurl_json https://localhost/api/application/appcatalog/helm/interesting-anaconda -X DELETE | jq

Note: In this example, interesting-anaconda is the chart_name when the application was started.