Data Science

Anomaly detection in mobile sensor data using Machine Learning

Share this post:

This blog post is an excerpt from our solution tutorial – “Gather, visualize, and analyze IoT data“. The tutorial walks you through setting up an IoT device, gathering mobile sensor data in the Watson IoT Platform, exploring data and creating visualizations and then using advanced machine learning services to analyze data and detect anomalies in the historical data.

So, what is Anomaly Detection?

Anomaly detection is a technique used to identify unusual patterns that do not conform to expected behavior, called outliers. It has many applications in business, from intrusion detection (identifying strange patterns in network traffic that could signal a hack) to system health monitoring (spotting a malignant tumor in an MRI scan), and from fraud detection in credit card transactions to fault detection in operating environments.

In our day-to-day life, knowingly or unknowingly, We carry an IoT device. It is our mobile phone with inbuilt sensors which provides data from accelerometer and gyroscope. How about saving this sensor data somewhere and detect anomalies in that data?

That sounds like a cool idea. How can we achieve this? Do I need to code an app and ask users to download it from the store? Not required. A simple node.js application running on a mobile browser will provide us with the sensor data.

 

This tutorial uses the following IBM Cloud products:

Here’s the flow or architecture diagram,

So, you will create a node.js application, run that on a browser, store the accelerometer and gyroscope data to Cloudant NoSQL DB and then how do I detect Anomalies?

Here’s where IBM Data Science Experience comes handy. You will use the Jupyter Notebook that is available in the IBM Data Science Experience service to load your historical data and detect anomalies using z-score. You will start by creating a new project and then import the Jupyter notebook(.ipynb) through URL.

Anomaly detection will be performed using z-score. Z-score is a standard score that indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula: z = (X - µ) / σ where z is the z-score, X is the value of the element, µ is the population mean, and σ is the standard deviation.

Create a new project

  1. Go to the IBM Cloud Catalog and select Data Science Experience.
  2. Create the service and launch it’s dashboard by clicking Get Started
  3. Create a New Project and enter Detect Anomaly as the Name.
  4. Create and select Object Storage and Spark services. Refresh
  5. Create.

Connection to CloudantDB for data

  1. Click on Assets > + Add to Project > Connection
  2. Select the iot-db Cloudant DB where the device data is stored.
  3. Check the Credentials then click Create

Create a jupyter(ipynb) notebook

  1. Click New notebook > From URL
  2. Enter Anomaly-detection-sample for the Name.
  3. Enter https://raw.githubusercontent.com/IBM-Cloud/iot-device-phone-simulator/master/anomaly-detection/Anomaly-detection-DSX.ipynb in the URL.
  4. Create Notebook.

    Check that the notebook is created with metadata and code.

    Recommended version for this notebook is Python 2 with Spark 2.1. To update, Kernel > Change kernel. To Trust the notebook, File > Trust Notebook.

Run the notebook and detect anomalies

  1. Select the cell that starts with !pip install --upgrade pixiedust, and then click Run or Ctrl + Enter to execute the code.
  2. When the installation is complete, restart the Spark kernel by clicking the Restart Kernel icon.
  3. In the next code cell, Import your Cloudant credentials to that cell by completing the following steps:
    • Click 
    • Select the Connections tab.
    • Click Insert to code. A dictionary called credentials_1″ is created with your Cloudant credentials. If the name is not specified as “credentials_1”, rename the dictionary to credentials_1. credentials_1 is used in the remaining cells.
    • name that is required for the notebook code to run.
  4. In the cell with the database name (dbName) enter the name of the Cloudant database that is the source of data, for example, iotp_yourWatsonIoTPorgId_DBName_Year-month-day. To visualize data of different devices, change the values of deviceId and deviceType accordingly.

    You can find the exact database by navigating to your iot-db CloudantDB instance you created earlier > Launch Dashboard.

  5. Save the notebook and execute each code cell one after another or run all (Cell > Run All) and by end of the notebook you should see anomalies for device movement data (oa, ob, and og).

    You can change the time interval of interest to desired time of the day. Look for start and end values.

  6. Along with anomaly detection, the key findings or takeaways from this section are
    • Usage of Spark to prepare the data for visualization.
    • Usage of Pandas for data visualization
    • Bar charts, Histograms for device data.
    • Correlation between two sensors through Correlation matrix.
    • A box plot for each devices sensor, produced with the Pandas plot function.
    • Density Plots through Kernel density estimation (KDE).

Technical Offering Manager & Polyglot Programmer | IBM Cloud

More Data Science stories
May 2, 2019

Seamless Integration: Istio and External Services

By defining our own MCP server, we allow users to move to the Istio service mesh without any code and deployment model changes. This means we can easily use Istio to control, observe, connect, and secure services running outside Kubernetes clusters.

Continue reading

May 1, 2019

Two Tutorials: Plan, Create, and Update Deployment Environments with Terraform

Multiple environments are pretty common in a project when building a solution. They support the different phases of the development cycle and the slight differences between the environments, like capacity, networking, credentials, and log verbosity. These two tutorials will show you how to manage the environments with Terraform.

Continue reading

April 30, 2019

Introducing IBM Analytics Engine v1.2 and Announcing the Deprecation of IBM Analytics Engine v1.0

We are excited to inform you about the new version of IBM Analytics Engine v1.2 that will be available starting May 15, 2019. Along with this release, Analytics Engine v1.0 will be retired.

Continue reading