Automatic detection of anomalies in sensor data from buildings

Share this post:

Clients who maintain buildings spend many months building rules to create alarms when something goes wrong. However:

  1. It’s time consuming and tedious to manually enter all the rules
  2. Even once a large number of rules have been established, the system is still rather fragile and produces lots of false alarms. This is expensive because it wastes the engineers’ time.
  3. The system cannot learn over time (e.g. a new employee who likes to keep their office very cold).

The objective of this piece of work is to use IBM’s “big data” tools (both from the building and outside it) to learn the conditions that require an engineer’s attention without writing rules, and make it simpler and much cheaper when building, or extending an existing building. Instead of requiring rules to be manually entered it would sift through all the historical sensor data from the building to learn the dominant patterns and relationships. Once a model has been learnt, the system should produce an ‘anomaly’ score for any new data, and could also update its model over time. If a suitable model is used (a ‘predictive’ model as opposed to a ‘discriminative’ model) then the system could also make predictions. Our plan is to build a prototype of this system, whilst also recording a large, home-grown dataset to show off IBM’s big data tools.

Data collection

The Hursley building management system has over 15,000 objects. These are connected using a protocol called BACnet (Building Automation and Control NETworks) over IP. This BACnet IP network is physically separate from IBM’s ‘9′ network. We have written an application which continually polls all 15,000 objects on the network to request their present value. It takes a little under an hour to poll all 15,000 objects once. We store the data locally on the logging machine. Every midnight, the logging machine disconnects from the BACnet, connects to the IBM ‘9′ network and squirts the last day’s data to the ETS instance of BigInsights and then reconnects to the BACnet to continue data collection.

Spotting patterns

Before we can build a statistical model of the data, we need to visualise it to get a feel for what’s going on.

The plot below shows 3 weeks of data for about 100 BACnet objects. The X axis represents time. X axis ticks and grid are positioned daily at midnight. Each row represents a sensor (i.e. each tick on the y-axis is a single sensor). The sensors have been ordered by how well they correlate (so sensors close together behave similarly). The output for each sensor has been linearly mapped to the range 0 to 1. Red indicates missing data.

Some objects show daily and weekly patterns (for example the ‘cooling setpoints’ and the ‘internal room temperatures’ marked on the plot below). Some objects do not appear to follow any obvious pattern. Some objects return discrete values (e.g. ‘active’ or ‘inactive’) whilst others report continuous values.


Modelling approaches

The next phase of the project will be to build statistical models for the data. The first approach will be to build models using fairly simple statistical techniques (probably little more complex than is taught on A-Level stats modules). For each continuous-valued objects which follow a regular daily pattern, we will learn a simple normal distribution for each hour of the day. For continuous-valued objects which do not follow a daily pattern, we will just use regression. For discrete-valued objects, we will try using Markov chains conditioned on the hour of the day. Once this is done, we will look at simple ways to model correlations between objects.

In parallel with this statistical modelling, we will build a pretty user interface to show to clients.

If this all works, and if there’s time left then we might try some sexier statistical techniques like recurrent neural networks (especially ‘long short term memory (LSTM)‘, which excels at modelling time series data). These models are computationally expensive to train so we might need to train on a fast GPU. Or, further into the future, maybe the code could be re-implemented on IBM’s new ‘neurosynaptic’ chips produced under the SyNAPSE project.

More stories

Cloud Excellence Awards 2017 Winners

The Cloud Excellence Awards for 2017 were held on 20th September with a winning place given to one of the project ETS have been working on.  The evening event was held at a hotel in London and was well attended by many of the candidates.  There were a wide variety of submissions in the various categories […]

Continue reading

IBM & Ogilvy Hackathon 2017 (May, Sea Containers, London)

Background Ogilvy UK hosted a 2-day Hack event with the explicit purpose of solving a problem, big or small, focusing entirely on the individual citizen’s healthy living. Using the most comprehensive suite of machine learning and AI tools, Watson the technology specialists found solutions to help citizens engage in positive behaviour using data, insights, and […]

Continue reading

What Makes a Great Wimbledon Champion

IBM has been supporting Wimbledon’s pursuit of greatness for 27 years. In recent years, much of that contribution has come from the Hursley based Emerging Technology team. This year’s challenge? To find out what makes the greatest Wimbledon champion of all time. A team of statisticians, tennis analysts, coaches, journalists and developers assembled to start […]

Continue reading