Big Data unlocks more information than traditional storage approaches can – relational, file stores, streaming, document repositories, unstructured and enterprise repositories all available from within the ‘data lake’ (a Big Data construct where many different types of data reside).
It also enables the democratisation of analytics, whereby business users (without formal data management training) use advanced analytics tooling to access these lakes and draw out insight where otherwise it would be locked away in separate repositories, and subject to arcane access rules. The final piece of the jigsaw is the Cloud, the place where non-traditional data can be sourced and managed into a format which forms part of your overarching lake of Big Data. This can be done elastically and on a pay-per-use basis, therefore lowering upfront costs and de-risking investment in new technologies.
Exploring the real prospect of Data as a Service
The concept of a business services presentation of data is not a new one – the Service Oriented Architecture (SOA) discipline which Gartner coined in 1996 included the concept of the data services layer as a core component of the IT landscape – part of an ecosystem of loosely coupled components which share data definitions, standards and Quality of Service to deliver both data and functionality to a business need. This is one of the bases of the business case – once users are clear on the Quality of the information delivered, they will be able to do more for themselves, reducing the need for IT support, and allowing more intuitive exploration of the data which increases take-up and further reduces the reliance on IT support.
With the coming together of Big Data, the cloud (particularly hybrid clouds – which reside both off site and on premise) and initiatives such as the Open Data Platform and Apache Atlas, Data as a Service is now a real prospect. In its most basic form, data (or more correctly, Information) can be delivered to a broad range of business applications (most notably Analytics, but also operational uses such as CRM and supply chain management) without the user being intimately aware of the structures and the provenance of the data – they are only interested in the information itself and the fact that it is delivered at a given ‘quality of service’. This quality of service can be broken down into such measurable qualities as ‘currency’ (i.e. how up to date the underlying data is), ‘accuracy’, ‘Completeness’ and ‘filiation’ (i.e. where information has come from and how it has been nurtured)
For a given use case, it may be perfectly reasonable to make a compromise on one or more of these qualities whilst focusing on another (e.g. more current information is more useful for call centre analytics than its completeness). The beauty of a Service for delivering data is that users can ‘throttle back’ on one or more qualities, making informed decisions about the compromises they are making. In a finance function, for example, you may be willing to wait for data which is more accurate, but less current. Using such an approach and focusing on timeliness rather than complete accuracy, a motor manufacturer has been able to better predict the number of returned faulty vehicles and make a better judgement about the appropriate action to take at the time of the return.
In addition to the data/information itself, there is provided ‘metadata’ which describes the quality of the information provided…it should be possible to select information from a menu of varying degrees of quality and for the user to make a selection based on business need – clearly any analytics ore reporting and the decisions based upon it would also inherit the qualities of the data provided.
In summary, a services based approach to the delivery of data (and the functions which operate upon it) allows users to exploit the full promise of the Big Data approach, whilst at the same time supporting the imperatives around the appropriate application of quality to the underlying data.
IBM is helping our clients deploy and support Data as a Service solutions as a first step in the Big Data path. Typically, the first place to start is a small initiative to prove out a particular hypothesis – made possible by quickly bringing together a multitude of different data types into a test area and supported by advanced analytics (for example by using a Data Science approach).