Configuring the notebook files

Before you can configure custom functions for ONNX models and use those functions for streaming data metrics, you must set up the notebook files to support the functions. The files include the notebook and sample data.

Before you begin

Ensure that you have administrative access to the IoT tool and Maximo® Monitor. You also need a Kaggle account and an IoT tool API key and token that uses the Standard Application role. The following procedure uses public data that is available from Kaggle.

Ensure that you review the restrictions. For more information, see Configuring custom functions for streaming data metrics.

Procedure

Create a root directory or folder for the notebook files. This directory or folder will contain all of the files and sample data for the notebook.
Create the config.json file and paste the following content into the file:
```
{
    "IOT_URL": "domain",
    "IOT_API_KEY": "API key",
    "IOT_API_TOKEN": "token"
}
```
Replace the domain variable with the IoT tool domain. To retrieve the domain, open the IoT tool in a web browser, for example, tenant_id.iot.mas_instance_id.domain.com, and copy the domain portion of the URL. Replace the API key and token variables with the IoT tool API key and token.
Save the config.json file to the root directory or folder that you created in step 1.
If your IBM® Maximo Application Suite instance uses certificates that are not signed by a well-known certificate authority (CA), download the CA .pem file and save the file as mas_ca.pem at the same level in the root directory or folder as the config.json file.

Create the notebook.

In the root directory or folder, create a notebooks folder.
Create a root/notebooks/onnx_tutorial_1.ipynb file.

Paste the following content into the file.

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51dfe6a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -----------------------------------------------------------\n",
    "# Licensed Materials - Property of IBM\n",
    "# 5737-M66, 5900-AAA, 5900-A0N, 5725-S86, 5737-I75\n",
    "# (C) Copyright IBM Corp. 2020, 2021, 2022, 2023 All Rights Reserved.\n",
    "# US Government Users Restricted Rights - Use, duplication, or disclosure\n",
    "# restricted by GSA ADP Schedule Contract with IBM Corp.\n",
    "# -----------------------------------------------------------"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb7bbf6e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import plotly.graph_objects as go\n",
    "import numpy as np\n",
    "import sklearn.ensemble\n",
    "import sklearn.model_selection\n",
    "import sklearn.metrics\n",
    "import mlprodict.onnxrt\n",
    "import jyquickhelper\n",
    "import skl2onnx\n",
    "import onnx\n",
    "import onnxruntime as rt\n",
    "import base64\n",
    "import json\n",
    "import requests\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "780fb7c9",
   "metadata": {},
   "source": [
    "# Loading and inspecting the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96c83ac9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data from kaggle https://www.kaggle.com/datasets/nphantawee/pump-sensor-data\n",
    "df = pd.read_csv('../data/sensor.csv')\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d6e5787",
   "metadata": {},
   "outputs": [],
   "source": [
    "# have a brief look how the features are correlated\n",
    "fig = go.Figure(\n",
    "    data=go.Heatmap(\n",
    "        x=list(df.columns),\n",
    "        y=list(df.columns),\n",
    "        z=df.corr(),\n",
    "        colorscale='viridis'\n",
    "    ))\n",
    "fig.update_layout(title='Feature Correlation', title_x=0.5)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac70473e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The categorical feature we want to predict is the machine_status\n",
    "# The heatmap shows that various features are highly correlated, namely sensor_13 sensor_25, so selecting one of these features for classification is sufficient.\n",
    "# At the expense of accuracy we further simplify the model and go with only 3 features.\n",
    "# We also ditch NaNs\n",
    "\n",
    "df_i = df[['sensor_00','sensor_13','sensor_37','timestamp', 'machine_status']].dropna()\n",
    "df_i.timestamp = pd.to_datetime(df_i.timestamp.values)\n",
    "df_i['machine_state'] = np.where(\n",
    "    df_i['machine_status'] == 'NORMAL', 0, \n",
    "    np.where(\n",
    "        df_i['machine_status'] == 'BROKEN', 1, \n",
    "        2\n",
    "    )\n",
    ")\n",
    "np.unique(df_i['machine_state'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa5fe7ad",
   "metadata": {},
   "source": [
    "## Using sklearn to train a RandomForestClassifier model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "020d24ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# try a straight-forward RFC\n",
    "\n",
    "X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(\n",
    "    df_i[['sensor_00','sensor_13','sensor_37']], \n",
    "    df_i['machine_state'], \n",
    "    random_state=42\n",
    ")\n",
    "\n",
    "\n",
    "rfc = sklearn.ensemble.RandomForestClassifier(random_state=42)\n",
    "rfc.fit(X_train, y_train)\n",
    "y_pred = rfc.predict(X_test)\n",
    "np.count_nonzero(y_pred == 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb484ded",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute and visualise the confusion matrices (current label vs. others) to validate the accuracy of our classifier\n",
    "\n",
    "y_unique = y_test.unique()\n",
    "\n",
    "mcm = sklearn.metrics.multilabel_confusion_matrix(y_test, rfc.predict(X_test), labels = y_unique)\n",
    "\n",
    "# and plot them\n",
    "disp = []\n",
    "for i in (0,1,2):\n",
    "    disp.append(sklearn.metrics.ConfusionMatrixDisplay(confusion_matrix=mcm[i], display_labels=['OTHERS', y_unique[i]]))\n",
    "    disp[i].plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "681222ad",
   "metadata": {},
   "source": [
    "## Converting to an ONNX model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0025b4ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "initial_type = [('float_input', skl2onnx.common.data_types.FloatTensorType([None, 3]))]\n",
    "\n",
    "onx = skl2onnx.to_onnx(\n",
    "    rfc, \n",
    "    initial_type, \n",
    "    options={'zipmap': False},\n",
    "    target_opset={'': 15}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f32c23bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualise the RFC model\n",
    "sess = mlprodict.onnxrt.OnnxInference(onx)\n",
    "dot = sess.to_dot()\n",
    "jyquickhelper.RenderJsDot(dot)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17edd80f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The RFC model above returns a probability value for each label: e.g. [0,1, 0.3, 0.9]\n",
    "# We are only interested in the probability for the \"chosen\" status value - i.e. the maximum probability value\n",
    "# To achieve this, we build an additional ONNX model that includes a \"ReduceMax\" node and compose it with the sklearn ONNX model above\n",
    "\n",
    "node = onnx.helper.make_node(\n",
    "    \"ReduceMax\",\n",
    "    inputs=[\"probabilities_in\"],\n",
    "    outputs=[\"probability\"],\n",
    "    domain=\"\",\n",
    "    keepdims=0,\n",
    "    axes=[1]\n",
    ")\n",
    "\n",
    "X = onnx.helper.make_tensor_value_info(\"probabilities_in\", onnx.TensorProto.FLOAT, [None, 3])\n",
    "Y = onnx.helper.make_tensor_value_info(\"probability\", onnx.TensorProto.FLOAT, [None])\n",
    "\n",
    "\n",
    "graph_def = onnx.helper.make_graph(\n",
    "    [node],     # nodes\n",
    "    \"Max out\",  # name\n",
    "    [X],        # inputs\n",
    "    [Y],        # outputs\n",
    ")\n",
    "\n",
    "model_def = onnx.helper.make_model(\n",
    "    graph_def, \n",
    "    opset_imports=[\n",
    "        onnx.helper.make_opsetid('', 15)\n",
    "    ]\n",
    ")\n",
    "\n",
    "combined_model = onnx.compose.merge_models(\n",
    "    onx,\n",
    "    model_def,\n",
    "    io_map=[(\"probabilities\", \"probabilities_in\")],\n",
    ")\n",
    "\n",
    "combined_model_data = combined_model.SerializeToString()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42954fef",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# visualize the composed model\n",
    "sess = mlprodict.onnxrt.OnnxInference(combined_model)\n",
    "dot = sess.to_dot()\n",
    "jyquickhelper.RenderJsDot(dot)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b0d0e0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Try executing the model with a few sample inputs\n",
    "options = rt.SessionOptions()\n",
    "session = rt.InferenceSession(combined_model_data, sess_options=options)\n",
    "\n",
    "ortvalue = rt.OrtValue.ortvalue_from_numpy(X_test.values)\n",
    "input_name = session.get_inputs()[0].name\n",
    "label_name = session.get_outputs()[0].name\n",
    "probs_name = session.get_outputs()[1].name\n",
    "\n",
    "outputs = session.run([label_name, probs_name],\n",
    "    {\n",
    "        input_name: [\n",
    "            [1.220891,3.3998150000000003,114.9087],\n",
    "            [0.0,0.2227016,116.8404],\n",
    "            [0.0,0.2370912,114.2079],\n",
    "            [0.0,0.3187984,110.5263],\n",
    "        ]\n",
    "    }\n",
    ")\n",
    "\n",
    "for output in outputs:\n",
    "    print(output)\n",
    "\n",
    "# And verify that we see the expected outputs\n",
    "assert np.array_equal(outputs[0], [0,0,1,2])\n",
    "assert np.allclose(outputs[1], [0.99999934, 0.6199997, 0.5999997, 0.88999945])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9f50b57",
   "metadata": {},
   "source": [
    "# Register the ONNX model with the IoT tool API\n",
    "\n",
    "In order to execute the model as a streaming metric in Monitor, we first need to register the model with the IoT tool."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ae772d3",
   "metadata": {},
   "source": [
    "## Convert the ONNX binary into a format suitable for submission to the IoT tool HTTP API\n",
    "\n",
    "In the IoT tool, ONNX models are represented as JSON. It includes the ONNX binary `data`, encoded as base64, along with an `id` used to reference the model, a human-readable `description` and a model `type` discriminator:\n",
    "```json\n",
    "{\n",
    "    \"id\": \"onnx_tutorial_1.onnx\",\n",
    "    \"description\": \"Classifies status of a machine based on 3 sensor readings. Output 'label' is an integer representing status: one of 1 (NORMAL), 1 (BROKEN), 2 (RECOVERING). Output 'probability' is a float representing the model's confidence in the classification.\",\n",
    "    \"type\": \"ONNX\",\n",
    "    \"data\": \"<base64-encoded-onnx-binary>\"\n",
    "}\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4d8e073",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_base64_bytes = base64.b64encode(combined_model_data)\n",
    "model_base64_str = model_base64_bytes.decode('utf-8')\n",
    "\n",
    "model_id = \"onnx_tutorial_1.onnx\"\n",
    "\n",
    "model_dict = {\n",
    "    \"id\": model_id,\n",
    "    \"description\": \"Classifies status of a machine based on 3 sensor readings. Output 'label' is an integer representing status: one of 1 (NORMAL), 1 (BROKEN), 2 (RECOVERING). Output 'probability' is a float representing the model's confidence in the classification.\",\n",
    "    \"type\": \"ONNX\",\n",
    "    \"data\": model_base64_str\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bd47ea0",
   "metadata": {},
   "source": [
    "## Issue a request to the IoT tool HTTP API to create/update the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f08be62",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load config\n",
    "with open(\"../config.json\", \"rb\") as f:\n",
    "    config_json = json.load(f)\n",
    "    IOT_URL = config_json['IOT_URL']\n",
    "    IOT_API_KEY = config_json['IOT_API_KEY']\n",
    "    IOT_API_TOKEN = config_json['IOT_API_TOKEN']\n",
    "\n",
    "verify=None\n",
    "if os.path.exists(\"../mas_ca.pem\"):\n",
    "    verify=\"../mas_ca.pem\"\n",
    "\n",
    "models_url = f'https://{IOT_URL}/api/v0002/pipeline/models'\n",
    "model_url = f'{models_url}/{model_id}'\n",
    "\n",
    "\n",
    "\n",
    "# Has the model been previously registered?\n",
    "get_res = requests.get(\n",
    "    url=model_url,\n",
    "    verify=verify,\n",
    "    auth=(IOT_API_KEY, IOT_API_TOKEN),\n",
    ")\n",
    "\n",
    "\n",
    "if get_res.status_code == 200:\n",
    "    # The model already exists, update it\n",
    "    put_res = requests.put(\n",
    "        url=model_url,\n",
    "        verify=verify,\n",
    "        auth=(IOT_API_KEY, IOT_API_TOKEN),\n",
    "        json=model_dict\n",
    "    )\n",
    "    if put_res.status_code == 200:\n",
    "        print(\"Model updated successfully\")\n",
    "    else:\n",
    "        raise Exception(f\"Bad response from PUT {model_url}: {put_res.status_code} {put_res.text}\")\n",
    "        \n",
    "elif get_res.status_code == 404:\n",
    "    # The model does not exist yet, create it\n",
    "    post_res = requests.post(\n",
    "        url=models_url,\n",
    "        verify=verify,\n",
    "        auth=(IOT_API_KEY, IOT_API_TOKEN),\n",
    "        json=model_dict\n",
    "    )\n",
    "    if post_res.status_code == 201:\n",
    "        print(\"Model created successfully\")\n",
    "    else:\n",
    "        raise Exception(f\"Bad response from POST {model_url}: {post_res.status_code} {post_res.text}\")\n",
    "\n",
    "else:\n",
    "    raise Exception(f\"Bad response from GET {model_url}: {get_res.status_code} {get_res.text}\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "vscode": {
   "interpreter": {
    "hash": "cafd87664a0a85b17736ebdfd95a92a30730b501d976605e8ea591587133b199"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

Create a root/requirements.txt file and past the following content into the file:

jupyter
pandas
matplotlib
scikit-learn
skl2onnx
onnx
onnxruntime
mlprodict
jyquickhelper
plotly
requests

Create the sample data.

Create a root/data folder.
Download the sensor.csv.zip file from Kaggle and extract the sensor.csv file to the root/data folder.
Delete the sensor.csv.zip file.

Create a root/data/onnx_tutorial_1.csv file and paste the following content into the file:

"sensor_00","sensor_13","sensor_37"
1.5337379999999998,3.459598,115.8414
1.349769,3.505052,113.0305
1.290741,3.520435,114.9087
1.220891,3.3998150000000003,114.9087
0.0,0.2227016,116.8404
0.0,0.2370912,114.2079
0.0,0.3187984,110.5263
0.0,0.3428673,115.3448
0.0,0.3046647,105.0438
0.0,0.6061783000000001,113.034
0.3059607,0.0,113.3796
0.3059606552124019,0.0,112.7037
0.3059607,0.0,113.8783
0.3069445,0.0,125.0
0.3059607,0.0,115.674
0.3069445,0.0,121.5109
0.3020255,0.0,108.9471
0.3059607,0.0,130.3036
0.3059606552124019,0.0,130.5688
0.3059607,0.0,116.8488
0.3069445,0.0,121.9446
0.3039930999999999,0.0,137.9632
0.3049769,0.0,131.744
0.3059606552124019,0.0,127.6902
0.3069445,0.0,118.678
0.3049769,0.004356791,121.1815
0.3059607,0.001376911,121.2919
0.3059607,0.05230475,123.3779
0.3039930999999999,0.02334995,119.5908
0.3069445,0.1723983999999999,126.0616
0.3059607,0.0,118.6551
0.3039930999999999,0.0,118.4787
0.3039930999999999,0.003226684,117.1901
0.3059607,0.0,116.0482
0.3059606552124019,0.0,129.3549
0.3059607,0.0480354,127.6291
0.3069445,0.0,114.7538
0.3049769,0.0,116.954
0.3049769,0.003536452,117.5382
0.3069445,0.0,121.9531
0.3059607,0.0,124.4986
0.3039930999999999,0.0,134.5426
0.3039930999999999,0.0,131.4489
0.3059607,0.0,136.1429
0.3059607,0.0,121.9824

Confirm that the root directory or folder that you created in step 1 contains the following folders and files:
- root/config.json
- root/requirements.txt
- root/notebooks/onnx_tutorial_1.ipynb
- root/data/sensor_pump.csv
- root/data/onnx_tutorial_1.csv
- Conditionally, root/mas_ca.pem

In a command line, run the cd command to open the root directory or folder and then run the following command:

python3.9 -m venv .pyenv
source .pyenv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
jupyter notebook

Results

The Jupyter interface appears in a browser. You can now load and run the tutorial notebooks.