When I first joined IBM in 2007 it seemed somewhat anachronistic that the Toronto Software Lab was managed by the leader of the Sensors and Actuators group. Now it seems prescient. As we consider the Internet of Things and see that all the physical objects around us have a useful place in the world of information, we see that our information assets can be viewed from a more traditionally physical perspective as well.
Databases are one of the most important assets that we have in an organization, certainly equal to our physical assets. As we consider the value in all the physical sensor information available, telling us who entered and exited every building, showing us through RFID tags what components flowed through an assembly line and so on, we should recognize the value of sensors on our databases as well.
I’ll focus on a particular type of sensor, one that provides a stream of the data changes occurring in the database. IBM’s Cloudant database provides a REST API that delivers a sensor stream of changes. IBM InfoSphere Data Replication can provide a sensor stream of changes from your distributed and mainframe-based relational databases, as well as from non-relational databases such as IMS and VSAM.
The original role for data replication technology was to enable low impact and low latency data movement. Data replication technology captures the changes occurring on the source database quickly and with minimal impact on that database and without requiring any changes in the database application. InfoSphere Data Replication captures changes from the database recovery logs. These traits make it ideal as a sensor.
Data replication has always had a role as an audit tool. Government regulations require certain industries to maintain an audit tail for their key data. Traditionally data mining was rarely done on these audit trails (let’s call them database sensor logs). The database sensor logs were kept primarily to meet the regulatory requirements.
Over time some industries have begun performing analytics on these sensor logs. Banks are using machine learning techniques to identify potential fraud events. Cell phone companies have been using streaming analytics to identify upsell opportunities. This use of analytics will grow as the Internet of Things continues to drive better analytics tools and create more data scientists experienced at working with sensor data.
I am often talking with clients as they begin to create an exploratory zone. They all understand the importance of having a copy of their database data in this exploratory zone and are interested in data replication technology as a way of maintaining a current copy of that database data. For exploratory zones that are being built around Hadoop it is easy to explain the advantages of using a database sensor log to provide that data as it suits the natural processing model of HDFS and Hive. Data replication can provide the sensor log as a series of files stored in HDFS and the data scientist can create Hive views over those files that can allow them to see either the entire audit trail or collapse that audit trail to just show the latest contents. Access to an audit trail is essentially a free side effect of the most practical method to provide data scientists with a current copy of the data and suits the general philosophy that one should not discard data on the way into your exploratory zone.
Most of our clients are just beginning the process of discovering the valuable questions that can be answered using this sensor log. An interesting difference between a database sensor log and a conventional physical sensor log is that the physical sensor log is often the primary source for both the current state of the physical object and the history of that state. You may learn both the current temperature of the engine block and the changes in that temperature over time. Many of the ideas discussed around the Internet of Things, such as the connected car, are primarily leveraging the information about the current state. This sort of analytics around the current state is already in place for databases. If you want to look at the Internet of Things to seed your thinking about what you may be able to get from database sensor logs you need to focus on those that are dependent on the history, not just the current state.
The use of personal fitness trackers to identify when a person with mobility issues may have fallen is an example that requires history. It seems quite similar to the fraud detection example that is already being done with database sensor logs. Some aspects of the connected car do depend on history, tracking the changes over time between two different sensors, say RPM and oil pressure, to ensure they maintain the expected relationship as they change. This might be comparable to comparing the database sensor log with the click stream from your application to confirm how many clicks it is taking to make specific types of updates to your system of record.
I think we are just scratching the surface here. I’m interested to see what other answers we will find. I encourage you to add a database sensor log to the assets you make available to your data scientists.