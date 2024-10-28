A deeper understanding of biology has the potential to transform how healthcare is delivered, how people are screened, how clinical conditions are treated and monitored. The exponential growth of biomedical datasets over the recent years has resulted in the identification of a great number of molecular signatures vital for the realization of the personalized diagnosis and treatment era. We are at an inflection point, where we have witnessed 100,000-fold reduction in cost since the human genome was first sequenced in 2001. Today, the rate of data volume increase is similar to the rate of decrease in sequencing cost.

In fact, the sequencing cost per human genome has decreased from nearly 100,000 USD to just 200 USD in September 2022. High-throughput sequencing technology, notably next generation sequencing (NGS) platforms, has led to a multiomics revolution. Processing terabytes or even petabytes of increasing complex omics data generated by NGS platforms has necessitated development of omics informatics.

Large-scale and complex datasets are increasingly being considered, resulting in some significant challenges:

Scale of data integration: It is projected that tens of millions of whole genomes will be sequenced and stored in the next five years. Most individual omics informatics tools and algorithms focus on solving a specific problem, which is usually part of a large project. This forces organizations to integrate multiple tools into a single pipeline to serve various goals. Multimodal data: Omics data come from different — usually siloed — sources and in different formats, from raw sequences and signals to high-resolution images and mass spectrometry. Analytical requirements: Once the data has been brought onto a single platform, and the tools have been assembled into a pipeline, computational techniques must be deployed to interpret data.

That’s where the next problem lies: omics data analysis and interpretation, including sequence alignment, assembly and variant discovery, are computationally intensive tasks required for interpretation and other downstream analysis and thus are of importance to guarantee overall accuracy. To solve this challenge, IBM Consulting is working with partners like Amazon Web Services (AWS), who are focused on providing a platform and tool set for processing omics data at in a secure, scalable and cost optimized manner.