A deeper understanding of biology has the potential to transform how healthcare is delivered, how people are screened, how clinical conditions are treated and monitored. The exponential growth of biomedical datasets over the recent years has resulted in the identification of a great number of molecular signatures vital for the realization of the personalized diagnosis and treatment era. We are at an inflection point, where we have witnessed 100,000-fold reduction in cost since the human genome was first sequenced in 2001. Today, the rate of data volume increase is similar to the rate of decrease in sequencing cost.
In fact, the sequencing cost per human genome has decreased from nearly $100,000 to just $200 in September 2022. High-throughput sequencing technology, notably next generation sequencing (NGS) platforms, has led to a multiomics revolution. Processing terabytes or even petabytes of increasing complex omics data generated by NGS platforms has necessitated development of omics informatics.
Large-scale and complex datasets are increasingly being considered, resulting in some significant challenges:
Scale of data integration: It is projected that tens of millions of whole genomes will be sequenced and stored in the next five years. Most individual omics informatics tools and algorithms focus on solving a specific problem, which is usually part of a large project. This forces organizations to integrate multiple tools into a single pipeline to serve various goals.
Multimodal data: Omics data come from different — usually siloed — sources and in different formats, from raw sequences and signals to high-resolution images and mass spectrometry.
Analytical requirements: Once the data has been brought onto a single platform, and the tools have been assembled into a pipeline, computational techniques must be deployed to interpret data.
That’s where the next problem lies: omics data analysis and interpretation, including sequence alignment, assembly and variant discovery, are computationally intensive tasks required for interpretation and other downstream analysis and thus are of importance to guarantee overall accuracy. To solve this challenge, IBM Consulting is working with partners like Amazon Web Services (AWS), who are focused on providing a platform and tool set for processing omics data at in a secure, scalable and cost optimized manner.
What is Amazon Omics?
Amazon Omics is a HIPAA eligible, GDPR compliant, purpose-built service to help healthcare and life science organizations and their software partners store, query, and analyze genomic, transcriptomic, and other omics data and then, generate insights from that data to improve health and advance scientific discoveries. Using Amazon Omics, clients are scaling multimodal and multiomic analyzes, generating insights from omics profile, images, medical claims and health records data processed simultaneously using other AWS services, such as Amazon HealthLake, Amazon Comprehend Medical and Amazon Transcribe Medical.
Accelerate genomic innovation
Some organizations continue to debate whether a cloud journey has been right for them, and others have yet to present a business case to their executive teams. However, healthcare and biopharma community tend to agree that pursuing population-level multimodal and multi-omic research is cost-prohibitive using the traditional data center infrastructure. Challenges include vast amounts of raw data, spiky nature of workloads, multitudes of tools used, strict security and compliance requirements, a need for cross industry collaboration and time to market.
With a low price per gigabase, ability to efficiently store, index and secure petabytes of raw sequence data, Amazon Omics becomes a no-brainer solution for storing heterogeneous omics data at scale, with a pay-as-you-go model.
The next step in the data journey is performing analytics on the ingested data and streamlining the output in interoperability-ready formats with reproducible and scalable pipelines. This purpose-built service offers automated analysis workflow, so that various types of data (beyond sequencing data, images, records, claims and more) can be brought in for analysis using Amazon Athena, Amazon EMR, Amazon SageMaker, and Amazon QuickSight.
Scientists can finally focus on science and let the undifferentiated heavy lifting be replaced by intelligent, scalable and purpose-built cloud services and solutions, eliminating opportunities for error arising from manual or disparate data processing. Lastly, the output of sequencing analysis are genomic variants presented as massive semi-structured files (Variant Call Files and Genomic Variant Call Files), which then are usually annotated (“assigned meaning”). To analyze annotated data at scale requires a query-ready data schema — another benefit that Amazon Omics provides — allowing to receive the output data as an Apache Iceberg Table. This process alone saves hundreds of hours of productive time. With Amazon Omics awareness of file formats like FASTQ, BAM and CRAM, clients can focus on data, bring in workflow definition tools like WDL, letting Amazon Omics take care of the rest. Clients enjoy built-in, attribute-based controls to define fine-grained data access policies to enact effective governance, along with comprehensive logging and data provenance to track data accessibility. All of these features help with an accelerated time to market, along with developing a secure, scalable omics processing pipeline.
Transforming the future of genomic data analysis
IBM Consulting, a Premier Consulting Partner for AWS, with 18,000+ AWS certified professionals across the globe, 16 service validations and 15 AWS competencies, is at the edge of innovation helping many life sciences clients like Moderna, Genomics England, Chugai, Johnson & Johnson and others, innovate across the life sciences value chain. IBM Consulting is a proven consulting partner for life science organizations, with solutions ranging from R&D, supply chain and manufacturing, to sustainability and Quantum Computing. This is one of the reasons IBM Consulting was awarded the Global Innovation Partner of the Year and the GSI Partner of the Year for Latin America at AWS re:Invent 2022, cementing clients and AWS trust in IBM Consulting as a trusted partner of choice.
Leveraging its genomics experience, IBM has published a whitepaper, Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences, and has also invested in building an accelerator, to enable researchers to perform phenotype prediction from omics data (e.g., gene expression; microbiome data) and any tabular data (e.g., clinical) using a range of machine learning models. Combining these assets, experience and modern tools like Amazon Omics, IBM Consulting can help you accelerate your genomics data analysis journey.
As a testament to IBM Consulting expertise in this area, IBM Consulting has been selected as a strategic technology partner for Genomics England over 18 months. Using AWS as the hyperscaler of choice, Genomics England’s services are growing rapidly, providing researchers access to genomic datasets to enable scientific discovery. IBM Consulting is helping Genomics England in areas like refining the clinical user interface and enable swift access to genomic data stores, supporting adoption of cloud capabilities into a hybrid cloud and running large-scale, stable, supportable and sustainable IT services. Read how IBM is helping Genomics England UK driving transformation and building the capability and technology to enable more evidence-based diagnostics and treatments for patients.
To capitalize on the benefits of data analytics and digital collaboration in genomics, life sciences companies must have a forward-thinking data and delivery model strategy, navigate the biopharma and academic community to form an ecosystem, develop a secure and resilient genomics data architecture, and a value realization program. IBM delivers this to our business partners through Operating Model Transformation, Tech and Data/AI Strategy, AI at Scale and Genomics Data Architecture offerings.
Use a new service to your advantage: have a strategy
A decision to use Amazon Omics and other AWS analytics services is easy: it removes the need for data scientists, researchers, clinicians to set up or maintain tools, workflows and infrastructure. However, solving for how the data will be stored, processed and analyzed is only one part of the answer. It is up to the executives to define what new product, new service, new partnership or other competitive advantage this service can open doors to. Reach out to IBM Life Sciences and Enterprise Strategy teams for perspective on integrated business and technology strategy and integration of AWS for Health solutions.