AI

Data Democratization – making data available

November 24, 2021 | Written by: Tommie Hallin

Categorized: AI

Share this post:

One of the trending buzzwords of the last years in my world is “Data Democratization”. Which this year seems to have been complemented by “Data Fabric” and “Data Mesh”. What it is really about the long-standing challenge of making data available. It is another one of these topics that often gets the reaction “How hard can it be? Just give me the data…”. It sounds easy but controlling your data as you take it out of the application context of where it was created is a challenge.

Also, there is a rapidly increasing understanding of the value of data, driving the push for making it available. This understanding was however not there when many of the applications for generating the data were built. Modern applications typically have convenient ways of making data available (e.g. with API’s). This is not as easy and straight forward with legacy (older) applications. And then adding applications delivered through cloud “as a service” typically introduces other, additional challenges of making data available with for example limitations to direct access to the data.

Key challenges of making data available

Making data available often incorporates taking data out of the application where it was created. It often also involves creating a copy of it and send it off to another database or other location. This step might be solved in other ways, e.g. through virtualization or mesh capabilities. In any case, it raises questions of ownership and involves giving up some control of data. It also raises additional questions of responsibility in making the data useful in another context.

Ensuring data is being used and interpret correctly (and for some data adhering to regulations, e.g. data privacy).
Risk of exposing data issues not visible in the application and covering for application specific logic that is applied at time of consumption.
Technically getting it out of the application with reasonable effort and performance (again it could be either physically or virtually).

How to prepare for data democratization?

The key to data democratization and making data available is meta data. Making sure the data is described, tagged and organised properly. And this has to be done above and beyond what is necessary in the application where data was created in the first place. In its original application there is typically a lot of context given “for free”. It is understood what the data means and how it is supposed to be used. When the data is decoupled from the application that context is not there and needs to be added to the meta data. This is additional effort not needed for the originating application but very much needed for the data democratization.

Another important aspect is setting the appropriate expectations for data quality – for what purposes is the data fit to be used. The quality of data is an entire topic of its own, which I don’t intend to cover in this article. Related to data quality is also the potential application specific logic that is applied at time of consumption. This logic needs to be covered for as the data is being made available outside of the application context. More thoughts on this can be found in my article on making data fit for purpose:

https://www.linkedin.com/pulse/what-does-take-make-data-fit-purpose-tommie-hallin

It is also important to establish an appropriate way to get data out of its origin of creation. As described before, the purpose is to use the data in a different context than the application where it was created. It doesn’t necessarily mean it has to be physically moved, even if that often is the case. Because usually data from different sources are required to be joined together and transformed to make sense, which can make it too complex to do in time of data consumption. But with advances in data virtualization and data mesh, there are options to physically extract and transform the data into a separate repository. In any case this should ideally not make any difference to the user of the data, it is rather a matter of the technical solution for making the data available and democratized.

What is the solution for making data available?

A “Data Catalog” is usually the “answer”, the “silver bullet”. But as in almost all cases, a tool in itself does not form the solution, nor solves the entire problem. Finding the way to letting go of the control of data is in my experience the main challenge – sometimes for legal reasons, sometimes for practical reasons and sometimes for emotional reasons.

Data Ownership, as the data is taken out of the application context, is a major challenge – then who owns the data? My view is that the owner of the data in an application context should continue to own the data in its untransformed format. Then as it is being transformed and joined to other data the ownership needs to change. A new set of Data Owners as data is decoupled from applications needs to be established. But since many of the data quality issues needs to be brought back to where data gets created it is important to have that link and responsibility connected. This data ownership structure usually leads into a discussion on “Data Office” and the emerging role of the Chief Data Officer (CDO), which is a separate but very related topic. Have a look at this blog series on the CDO topic:

https://www.ibm.com/blogs/journey-to-ai/2020/12/ibm-chief-data-and-technology-officer-summit-series/

Another key to the solution is to establish the data practice. The data team which has the skills and resources to continuously manage the processes of decoupling data from applications, applying logic and transformation and making it available to the data consumers. The overall handling the ever-growing appetite for data, which is driving the data democratization need.

Key 3 things to think about when it comes to Data Democratization:

It is never as easy as “just get the data out and give it to me”. Be prepared to put in the effort to make data available.
Data needs to be properly tagged and described to be useful outside of the application context.
Data Ownership and Data Practice needs to be in place and cover the ambitions of data democratization (and yes, a data catalog tool is very helpful)

I see that many organisations are taking action on establishing the needed foundations for data democratization and have the pleasure of being part in some of those transformations.

Please share your thoughts on data democratization.

Tommie Hallin

Data Catalog / Data Catalogue Data Democratization Data Fabric Data Quality

How to act in the new regulation of financial sector

New enablement materials for IBM Ecosystem Partners

On October 4th, IBM announced a revamped skilling program available for partners. The skilling and badging program is now available to our partners in the same way that it is available for IBMers, at no cost. This is something that our partners have shared, they want more expertise – more opportunities to sharpen their technical […]