August 12, 2021 | Written by: Tommie Hallin
Share this post:
I just finished reading IBM’s Science & Technology Outlook (STO) 2021. The report starts with a statement that really resonated with me.
“COVID-19’s impact on the world has emphasized the importance of science.”
Reading the report what stands out to me is the emphasis on scientific method and approach for discovery. As well as the accelerated need of scaling discovery to build knowledge and make decisions.
“At the same time, science is experiencing a sea change of its own, with data and artificial intelligence being used in new ways to break through long-standing bottlenecks in scientific discovery.“ (IBM STO 2021)
Besides the method for discovery, you also need to have something to discover from, namely Data.
The STO 2021 report sparked a lot of thoughts in relation to some of my favorite topics: Data and Science.
Challenges of making data available
Being a positive, but also a realistic person, my first thoughts go to some of the challenges of making data available for scientific discovery, or any discovery for that matter. These challenges are far from new. Rather they have been there for as long as I can remember: Making data available at the right time, the right place with known quality.
The timing is getting more challenging as the needed data has to be current and up to date, even for discovery and analytical purposes. As stated in the STO 2021, “We need science to move faster”. What it means is to continuously reduce the time from when data is created to the time it is available for another purpose, usually referred to data latency.
Where to place the data?
The place where the data is needed is also getting more challenging as new types of collaborations across enterprises and their partners develop. The STO 2021 talks about, “Accelerated discovery requires integration of multiple complex workflows with different experts, implementers, and stakeholders “. The answer to this challenge is often: “data needs to be available in the cloud” – but a better answer is perhaps “data available anywhere”. So, when defining the data placement, it should consider data collaboration challenges and ensure the data placement is very flexible, dynamic and easy to move and share. However, this is not only a data placement challenge, it is also very much a data security challenge.
Known data quality
And finally, delivering data with known quality, meaning that it is validated and described from the consumers perspective. This is especially challenging when combining it with the two aspects above. Although the STO 2021, does not specifically call out data quality as a challenge, any increase of data use will increase the challenge of data quality. The paper does however bring up some other data governance aspects, “Putting values into practice requires a “by-design” mindset, infusing privacy, security, and ethical considerations into our engineering and technology development—from the very outset.”
In the end the data needs to be made available in a way where it is easily understood and consumable for the person or application that requires it – whether for discovery or other purposes.
The power of the cloud
Back to my positive attitude, I believe that there are new ways and technology advancements to handle the growing needs of data and analytics, utilising the power of cloud capabilities
- Possibility to manage and store data in a diverse and flexible way
- Dynamic scalability and conditional workload distribution of compute
- Benefiting from cost reduction by optimizing storage based on the type of data and type of consumption and pay-as-you-go commercial models.
I also believe that there is an opportunity to increase the consolidation of data management in a growing number of cases by delivering for both operational and analytical needs, as the requirements have a large overlap.
Fit for purpose data
Summarizing, my key thoughts and reflections from reading IBM’s STO 2021 is that there is a lot to apply in business from scientific approaches and methods, but also to take an even broader view on data needs and to ensure it is fit for purpose:
- Fit for purpose from a quality standpoint. Is the data statistically sound for the intended use?
- Fit for purpose from a timing perspective. Is the data current enough to form the basis for the intended use?
- Fit for purpose for the user. Whatever you discover in data, it needs to be understood in order to take action!
Then a final thought appears: If the purpose is to really discover new things in data, data in any shape should be considered, however my experience tells me that you need to define how the data would be fit for purpose – even for discovery – in order to understand and take action on the findings you make.
Closing with a quote from STO 2021, which in my opinion summarises the relevance and importance of both Science and Data:
“The pandemic has highlighted the potential of science both to produce critical breakthroughs and serve as a rigorous methodology to build knowledge and make decisions.”
Please share your thoughts and reflections on data and accelerated discovery!
Link to the STO: