# PMI Notebooks

MAS Predict Version 9.0 (Maximo Application Suite 9.0)

## Table of Contents:
- [Introduction](#introduction)
- [Notebooks](#notebooks)
- [Obtaining the Credentials](#obtaining-the-credentials)
   - [Accessing External MAS Deployment](#accessing-external-mas-deployment)  
- [Monitor Error and Self Signed Certifcates](#monitor-error-and-self-signed-certificates)
- [Examples/Tutorials](#examples--tutorials)
   - [RUL Custom Models](#rul-custom-model-notes)
   - [Model Lifecycle Management](#model-lifecycle-management)
   - [Explainability](#explainability)
- [Additional Usage Specific Notes](#additional-usage-specific-notes)

---

## Introduction:

There are several sets of notebooks provided with MAS Predict offering.

- **WS series of notebooks:** These notebooks porvide a simple, and quick way to test the ML pipelines against the dataset. These notebooks do not not involve any connection to IoT systems or datalake or Maximo. The purpose of these notebooks is to help the user test the ML pipeline with datasets ingesting the latter as CSV or JSON files. Use these notebooks to identify / engineer / extrat features, fit and evaluate models. Once satisfied replicate the settings to the corresponding PMI notebook to train with the data from the data lake and deploy in Monitor.
- **PMI series of notebooks:** These notebooks use PM Pipelines appropriate for the use case in consideration. Once you have an idea of how to fit the ML models (using WS version of the notebook) with your dataset, you can apply the same settings, features, hyperparameters, and configuration to customize the PM pipelines (as allowed by the configurable parameters)
- **Custom notebooks**: These are custom notebooks that address scenarios / use cases beyond the ones addressed in the WS or PMI series of notebooks.
- **DQ series of notebooks**: These notebooks provide various options to use DQLearn module to get an idea of data quality. Use them as needed to understand the quality of your data. The Data Quality is also integrated with PM pipelines in the PMI notebooks.
- **Model Lifecycle Management**: These notebooks show an end to end example of how to deploy and use model monitors to check for both data drift and model drift. They include loading in a custom data set, training a model, deploying monitors to that model, and then uploading feedback to simulate data/model drift. 

---

## Notebooks
Notebooks have been tested on WatsonStudio CP4D 4.8, with Python 3.10 (IBM Runtime 23.1 on Python 3.10) . Recommended kernel capacity: 4vCPU, 16GB RAM

| Notebook                                              | Description                                                                                                         |
| ----------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------ |
| PMI Model Development Tutorial.ipynb                  | Tutorial for PMI Predict Model Development                                                                          |
| PMI - Custom Model Development.ipynb                  | General introduction for Custom model development                                                                   |
| PMI - Failure Probability-Binary Classification.ipynb | PMI notebook for binary type failure classification                                                                 |
| PMI - Failure Probability-MultiClassification.ipynb   | PMI notebook for multi-type failure classification                                                                  |
| WS - Failure Probability - BinaryClassification.ipynb | WS notebook for binary type failure classification                                                                  |
| WS - Failure Probability - MultiClassification.ipynb  | PMI notebook for multi-type failure classification                                                                  |
| WS - Root Cause Analysis.ipynb                        | WS notebook to find factors that caused failure. Failure contribution breakdown                                     |
| PMI - Predicted Failure Date.ipynb                    | PMI notebook for predicting failure date using Survive Analysis techniques                                          |
| PMI - Predicted Failure Date-Smart Regression.ipynb   | PMI notebook for predicting failure date using Smart Regression                                                     |
| WS - Predicted Failure Date.ipynb                     | WS notebook for predicting failure date or time to event using Survival Analysis techniques                         |
| WS - SmartRegressionExample.ipynb                     | An example of how to use SmartRegression pipeline for various use cases, including any custom model                 |
| WS - SmartClassificationExample.ipynb                 | An example of how to use SmartClassification pipeline for various use cases, including any custom model             |
| PMI - Anomaly Detection-SemiSupervised.ipynb          | PMI notebook to detect anaomalies in a dataset using semi-supervised learning techniques                            |
| WS - Anomaly Detection - SemiSupervised.ipynb         | WS notebook to detect anomalies in a dataset using semi-supervised learning techniques                              |
| PMI - Anomaly Detection-UnSupervised.ipynb            | PMI notebook to detect anomalies in a dataset using unsupervised learning techniques                                |
| WS - Anomaly Detection - UnSupervised.ipynb           | WS notebook to detect anomalies in a dataset using unsupervised learning techniques                                 |
| WS - Anomaly Detection_ConfigBased.ipynb              | Combined WS and PMI notebook features that can be driven using JSON based config to use either Unsupervised or Semi-Supervised technoques|
| PMI - End of Life Curve.ipynb                         | PMI notebook to understand the failure propagation and lifetime curve of assets                                     |
| WS - End of Life Curve.ipynb                          | WS notebook to understand the failure propagation and lifetime curve of assets                                      |
| PMI - Using Corrective Maintenance.ipynb              | Illustrates how to use corrective maintenance records to use with PM pipelines                                      |
| PMI - Use Aggregated Data in Monitor.ipynb            | Illustrates how to use monitor level aggregation with PM pipelines                                                  |
| Custom0-RUL-LSTM-WML.ipynb                            | LSTM based custom model for computing Remaining Useful Life (RUL) of a production asset                             |
| Custom1-RUL-LoadData.ipynb                            | Data loading to support the LSTM based custom model for computing Remaining Useful Life (RUL) of a production asset |
| Custom2-RUL-Scoring.ipynb                             | Scoring example for LSTM based custom model for computing Remaining Useful Life (RUL) of a production asset         |
| Custom3-RUL-in-Predict-UI.ipynb                       | An example to show how to use Predict UI to display predictions from custom models                                  |
| ModelLifecycle_DataLoader.ipynb                       | Data loader to demonstrate model lifecycle and drift for existing Predict models provided in this release           |
| ModelLifecycle_FailureProbability_Training.ipynb      | Training notebook configured to set up model monitoring for Failure probability prediction                          |
| ModelLifecycle_FailureProbability_Scoring.ipynb       | Failure probability scoring notebook configured to enable model feedbacks as part of scoring                        |
| ModelLifecycle_FailureProbability_FeedbackLogging.ipynb | Illustrates how to provide user feedback for failure probability predictions to compute data & model drifts       |
| ModelLifecycle_FailureProbability_DriftCharts.ipynb   | Illustrates how to retrieve data and model drift scores from the DB and visualize the drifts                        |
| ModelLifecycle_PredictedFailureDate_Training.ipynb    | Training notebook configured to set up model monitoring for Predicted Failure date                                  |
| ModelLifecycle_PredictedFailureDate_Scoring.ipynb     | Scoring notebook for Predicted failure date configured to enable model feedbacks as part of scoring                 |
| ModelLifecycle_PredictedFailureDate_FeedbackLogging.ipynb | Illustrates providing user feedback for predicted failure date estimates to compute data & model drifts         |
| ModelLifecycle_PredictedFailureDate_DriftCharts.ipynb | Illustrates how to retrieve data and model drift scores from the DB and visualize the drifts                        |
| Unsupervised_AnomalyDetection_DataLoader.ipynb | Loads data into the data lake for Unsupervised Anomaly detection. One can also directly ingest the data from a CSV file    |
| Explainable_Unsupervised_AnomalyDetection_Training.ipynb | Training notebook for Unsupervised Anomaly Detection enabled with explainability using Saliency Maps             |
| Explainable_Unsupervised_AnomalyDetection_Scoring.ipynb | Scoring notebook for Unsupervised Anomaly detection generating explanations using saliency maps                   |
| Explainable_Unsupervised_AnomalyDetection_Results.ipynb | Illustrates how to retrieve explanations for Unsupervised Anomaly detection results using saliency maps           |
| FailureProbability_Dataloader.ipynb                     | Loads data for explainable version of Failure Probability Prediction notebook                                     |
| Explainable_FailureProbability_Training.ipynb          | Training notebook for Failure Probability Prediction enabled with explainability                                   |
| Explainable_FailureProbability_Scoring.ipynb           | Scoring notebook for Failure Prediction Probability generating explanations                                        |
| Explainble_FailureProbability_Results.ipynb            | Illustrates how to retrieve explanations for Failure Probability Predictions                                       |
| Explainable_PredictedFailureDate_Dataloader.ipynb      | Loads data for explainable version of Predicted Failure date notebook                                              |
| Explainable_PredictedFailureDate_Training.ipynb        | Training notebook for Predicted Failure date enabled with explainability                                           |
| Explainable_PredictedFailureDate_Scoring.ipynb         | Scoring notebook for Predicted Failure date generating explanations along with each scoring                        |
| Explainable_PredictedFailureDate_Results.ipynb         | Illustrates how to retrieve explanations for Predcited Failure Date estimates                                      |
| Explainable_Regression_using_SHAP.ipynb                | Illustrates how to use SHAP technique in to generate explanations for an example regression model                  |
| Auto_Imputation.ipynb                                 | Auto imputation illustrated using a dataset with single index                                                       |
| DQLearn_MissingPatternAnalysis.ipynb                  | Illustration of Missing Pattern analysis APIs                                                                       |
| FastStart2021Loader-New.ipynb                         | Data loading to support the PMI version notebooks for POC purpose.                                                  |
| PMI_SPSS_Model_Registration.ipynb                     | PMI notebook to invoke the SPSS mdoel for detect anomalies.                                                         |
| UnregisterModels_in_Monitor.ipynb                     | Unregister the model in MAS Monitor                                                                                 |
| UpdateModel_in_Monitor.ipynb                          | Update WML Deployment Id in MAS Monitor                                                                             |
| AIX_Workaround.ipynb                                  | Notebook to troubleshoot a known issue in Explainability setup.                                                                             |

---

## Obtaining the credentials:

You will need the following credentials to run the notebook. You will need to be an admin in Maximo Manage to access these credentials, otherwise you will have to get these from somebody who is an admin.

You will only have to follow these steps for the first notebook. So, if you have already created the `Predict_Envs.json` file, you can skip ahead to running this notebook.

Steps:

- APM_ID: Application Administration -> System Properties -> Filter -> Search `PMIId` -> Current Value
- APM_API_BASEURL: Application Administration -> Integration -> End Points-> Search for predict -> click search result `PREDICTAPI` -> URL (note you just need first part of the url)
- APM_API_KEY: Application Administration -> Integration -> API Keys -> Copy key from user card, or Add API key for the user if API key does not exist.

These can then be placed in a file called `Predict_Envs.json`, formatted like this:

```json
{
  "APM_ID": "a1b1c1d1",
  "APM_API_BASEURL": "https://predict-url.svc",
  "APM_API_KEY": "aaabbbcccdddeeefffggghhhiiijjjkkklllmmmn"
}
```

Predict will also need a certificate in order to access the database. To get this certificate:

1. Go to the Openshift Console for your MAS instance.
2. Go to Projects and search for monitor.
3. Go to Secrets and search for db2-certificates.
4. Click the secret that pops up and copy its value under the Data section.
5. Paste this value into a file called `db2_certificate.pem`

Upload both files, db2_certificate.pem and Predict_Envs.json, to your CP4D project.

Additionally, if your CP4D instance is deployed separately from your MAS instance, you'll need to provide a few more credentials. The instructions for these are outlined in the below section.

---

### Accessing External MAS Deployment 
If your MAS deployment is deployed on a different cluter than this CP4D instance, you will have to follow the below steps to get these two credentials:

#### USER_PROVIDED_HEALTH_URL:
1. Go to the Openshift Console for your MAS instance.
2. Go to Projects and search for manage. If no projects show up, your MAS Health is deployed standalone and you just need to search for health instead.
3. Go to Routes. There should be one route that is of the format `{instance ID}-{manage or health}-main`. Copy this route.
4. Add this value to your Predict_Envs.json file with the key `USER_PROVIDED_HEALTH_URL` and append /maximo onto it.

#### USER_PROVIDED_DB_CONNECTION_STRING:
This value will need several values to work. It is of the format:
```python
DATABASE={name};HOSTNAME={hostname};PORT={port};PROTOCOL=TCPIP;UID={username};PWD={password};SECURITY=SSL;SSLServerCertificate=/project_data/data_asset/db2_certificate.pem
```
To get each of these credentials:
- `name`: usually just the value `BLUDB`
- `hostname`: 
    - Go to the Openshift Console for your MAS instance.
    - Go to Projects and search for db2u.
    - Go to Routes and find the route named `db2wh-iot`. This is the value for hostname.
- `port`:
    - Continuing from the hostname instructions above, click on the service associate with your Route.
    - On the right side, there will be two sets of ports under "Service port mapping". Select the very bottom port.
- `username`: default value is `db2inst1`
- `password`:
    - Continuing from the port instructions above, click on the Pods tab close to the top of the Service page.
    - Click into one of the pods that shows up.
    - Scroll down to the Volumes section. Click into the volume called instancepassword.
    - The value under the Data secion is the value for password.

After you obtain all the above credentials, add the entire string into your Predict_Envs.json file with the key `USER_PROVIDED_DB_CONNECTION_STRING`.

---

## Monitor Error and Self Signed Certificates

SSLCertVerificationError occurs when using notebooks with IBM Maximo Monitor API

If you are using the default IBM Maximo Predict notebooks in IBM Watson Studio and use those notebooks to call the IBM Maximo Monitor API, you may get the SSLCertVerificationError or 'Database' object has no attribute 'http' error.

To resolve the problem, add a new environment variable and then retrieve and import the Maximo Monitor certificate into the Maximo Predict notebook project:

1. Get the Monitor URL:

   1. Login to the openshift console as admin
   2. In Projects, search "monitor". Click on the project that is named `mas-{instance-id}-monitor` where `instance-id` is the name of your MAS instance.
   3. In the Routes under Networking, search "datalake". Find the route that ends in `rest-v2-datalake`. 
   4. Get the monitor_url from the Location value before the first "/". For example, if Location=`https://appsuite.api.monitor.predictdev.apps.hpdevocp.cp.fyre.ibm.com/api/datalake` then monitor_url=`https://appsuite.api.monitor.predictdev.apps.hpdevocp.cp.fyre.ibm.com`

2. Add a new environment variable. Paste this code into your python notebook, replacing the URL with the value from part 1.

   ```python
   monitor_url= replace_me_with_monitor_url

   import os
   os.environ['isICP']='true'
   os.environ['REST_METADATA_URL']=monitor_url
   os.environ['REST_KPI_URL']=monitor_url
   ```

3. Retrieve the certificate:
   1. Login to the openshift console as admin
   2. In the Projects, search "monitor". Click on the project that is named `mas-{instance-id}-monitor` where `instance-id` is the name of your MAS instance.
   3. Click Secrets under Inventory within the monitor project.
   4. Search "public-tls". Click on the secret that is named `{instance-id}-public-tls`
   5. Copy "ca.crt" and save it to a file named `ca_public_cert.pem`.
4. Import the `ca_public_cert.pem` certificate to the CP4D's "Data Assets"
   1. Login to CP4D
   2. Go to your project
   3. Click "New data asset"
   4. Upload the `ca_public_cert.pem` file.

Return to the notebook you were having troubles with and restart the kernel. The issue should now be resolved.

---

## Examples / Tutorials

For additional information on how to use the notebooks refer to the following notebooks depending on your need.

`PMI - Model Development Tutorial.ipynb`. This notebook is your first reference. This outlines how to customize the models for your dataset and usecase.  
`PMI - Custom Model Development.ipynb` shows how to build your custom models if needed.  
`PMI - Scoring Using Models Deployed on WML.ipynb` shows how to use a model that is already deployed on WML.

---

### RUL Custom Model Notes:

The following 3 notebooks show how to use LSTM for computing the Remaining Useful Life (RUL) for a production asset. Alternatively, the following notebooks can also be used as examples showing how to develop a custom deep learning model, deploy on WML, and have PMI model score it. The RUL custom models provided in this package must be run the following order:

    - Custom0-RUL-LSTM-WML
    - Custom1-RUL-LoadData
    - Custom2-RUL-Scoring

The notebook `Custom3-RUL-in-Predict-UI.ipynb` illustrates how to direct the output of custom models of this kind to the Predict dashboard. This can be optionally run at the end to have the results displayed on the Predict dashboard.

---

### Model Lifecycle Management

The current release provides model lifecycle management that involves monitoring the deployed model for drifts. The component enables data and model drifts for Failure Probability Prediction (classification) and Failure Date Prediction (regression) models in the offering. These notebooks have the prefix `ModelLifecycle_` in their file names. Each set of notebooks come with a data loader for ease of running the out of the box examples. 

The notebooks should be run in the following sequence - Data Loader, Training, Scoring, Feedback Logging, Drift Charts (visualization)

**NOTE** Model Lifecycle Management is supported only on WML deployment (not in the default Monitor deployment). 

Model Lifecycle Management is enabled using MAT (Monitoring And Testing) service. To connect to this service and use the lifecyle functions, one needs to find the URL for MAT service. To get the URL for MAT Service follow the steps outlined below.

    - Logon into the Openshift console
    - Go to `Projects` and search for "predict"
    - Go down to the `Inventory` section on the `Overview` tab, and click on `Routes`
    - Find the mat-service route and copy its location in the appropriate cell in the notebook

The credentials to connect to the MAT service are the same as the credentials to run the notebooks.

---

### Explainability

The current release offers enhanced explainability service that brings various techniques to enable local explanations for the predictions from three sets of models - Unsupervised Anomaly Detection, Failure Probability Prediction, and Predicted Failure Date. The models enabled for explainability service are provided as separate set of notebooks with a prefix `Explainable_` in their file names. Each set comes with a data loader for ease of running the out of the box examples.

The notebooks should be run in the following sequence - Data Loader, Training, Scoring, Results (visualization)

**NOTE** Explainability is supported only on WML deployment (not in the default Monitor deployment). 

Explainability service is enabled using AIXTS service. To connect to this service and use the techniques, one needs to find the URL for explainability service. To get the URL for explainability service, follow the steps outlined below.

    - Logon into the Openshift console
    - Go to `Projects` and search for "mas-<instance_id>-predict" (replace <instance_id with the id of the MAS instance)
    - Go down to the `Inventory` section and click on `Routes`
    - Choose the route URL in the `Location` column for "aiexpts-service" and remove the /ibm/aix/service. For exmaple, if Location=https://main.predict.xxx.xxx.com/ibm/aix/service, then Explainability service URL=https://main.predict.xxx.xxx.com

The credentials to connect to the explainability service are the same ones used to run the notebooks.

---

### Loading device data rows

The example notebooks with DataLoader in their names will step you through connecting the modeling capabilities in Maximo Predict to IOT data originating externally.  Maximo-Predict currently supports Watson IOT Platform (WIOTP) as the data source for devices/sensors. In order to allow devices to send events to WIOTP, we have to register devices to WIOTP and setup properly so the device events go into the data lake.

You can do that from the UI, as instructed in the secton of Maximo APM - Predictive Maintenance Insights Knowledge Center titled 'IoT device data integration overview'.

Or, you can also set it up directly within notebooks. In addition to setting up IOT devices, this method also supports bulk importing your device history data into the data lake (from CSV files). For an example, see the code section labeled 'Setup IOT Devices' in Explainable_FailureProbability_DataLoader.ipynb

## Additional Usage Specific Notes

1. To build failure date prediction there are two **mutually exclusive choices** available:
   - Smart Regression based approach for either binary class or multiclass failure types, - OR -
   - Survival Analysis model which **supports only binary-class**  
     Make sure **not to use both techniques** for the same set of assets. UI display may not be consistent if both model are deployed for the same asset(s).
2. To build failure probability prediction there are two **mutually exclusive choices** available:
   - Multiclass classification model for multilabel / multi-type / multiclass failures, - OR -
   - Smart Classification model, which **supports only binary type failure classification**  
     Make sure **not to use both techniques** for the same set of assets. UI display may not be consistent if both models are deployed for the same assets.
3. At this point one asset can be in only one asset group for building model. If an asset is put in multiple asset groups, the system may not work correctly.
4. Cloud Pak For Data(CP4D) Navigation:
   - Download the `notebook.zip` file and unzip it.
   - After you logon into your CP4D system, navigate to `Projects > New Project > Creat Empty Project`
   - After you created a project, click `Add to project`, select `Notebook`. Click `From File` and then drag and drop all the notebooks from the notebook.zip.
   - Click `Add to project`, select `Data`, then load all CSV files.
5. For WML deployment, you need to create deployment space by going to `Deployments` menu item in Watson Studio, to create a new deployment space. Make sure to update the corresponding variable in the notebooks to reflect this deployment space. The menu item is shown below in the screenshot
   ![image](https://media.github.ibm.com/user/11783/files/72bbe700-0064-11ed-93aa-eee591a1f548)

6. For additional help refer to the [CP4D documentation](https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/cpd/overview/overview.html)
7. To minimize the PM lib debug statements while running the notebook, reduce the logging level to either ERROR or CRITICAL or WARNING using pmlib.set_log_level("WARNING")
8. The predictions from some of the ML algorithms in the Unsupervised Anomaly Detection pipeline may not be explainable. If this turns out to be the case for a particular dataset, go back to the pipeline execution stage and choose a different ML algorithm that works better with explainability.
9. There are a couple of datasets that are too big for github. Files containing such large datasets are made available as zipped archives. Match the dataset file names in the WS notebooks with either the .CSV files or .zip files provided in this bundle. If a certain dataset is provided as a zipped archive, unzip it to the local directory, import into Watson Studio project as a data asset before proceeding with the notebook execution
