To advance our understanding of the human genome, scientists must process vast amounts of data. However, many research centers struggle to cope with volume of data they generate. How could Lab7 help?
Lab7 teamed up with IBM to build a HPC environment in the cloud, leveraging IBM® Spectrum™ technology for flexible, highly scalable data storage and user-friendly workload management.
96% reductionin the runtime of a standard genome analysis pipeline
1/3 the priceof using commodity solutions to perform the same work at scale
2 weeksfrom conceptual design to fully-functional IBM HPC environment in the cloud
Business Challenge story
Delving deeper into the human genome
Genomics, the study of an organism’s complete set of DNA, is a crucial area of scientific research. By studying the human genome and gaining a better understanding of our DNA, scientists hope to develop new ways to diagnose, treat, cure and even prevent many diseases.
To gain new insights, scientists must process and analyze enormous volumes of genomics data—which many research centers and institutes struggle to keep on top of.
Chris Mueller, Founder, President, and CTO of Lab7 Systems, takes up the story: “One of the biggest challenges genomics research presents is the sheer volume of data that's generated—a single instrument run can produce over 100 GB of data. Transferring so much data from lab equipment to short and long-term storage can be difficult to manage.
“Another significant challenge is making data available to scientists in a timely manner. We work with many sequencing core facilities that pre-process DNA samples in preparation for examination by scientists. We wanted to make it quicker and easier for facilities to deliver data to scientists. After all, the more seamless and efficient this process is, the faster scientists can get on with their research.”
Taking genomics research to the cloud
The partnership between Lab7 and IBM enabled the development of an end-to-end high-performance Genomic Cloud that would enable fast, effective management and analysis of genomics data at scale. The Genomic Cloud can support up to 500 compute nodes, with more than 10 petabytes of storage, helping achieve faster results and optimized resource usage for HPC applications.
Chris Mueller comments: “Most of our customers currently have their own on-premises infrastructure to store and analyze the data generated by their lab equipment. We set out to develop a more flexible, scalable alternative: HPC in the cloud. Essentially, we wanted to offer our customers a very familiar computing environment, but delivered as a cloud service.”
Built on the IBM Cloud platform, the Lab7 Genomic Cloud uses IBM Spectrum Scale™ and IBM Spectrum LSF to support rapid data processing and analysis.
Chris Mueller says: “Working as a team, we were able to combine all of our existing software assets and bioinformatics expertise with IBM’s extensive hardware and cloud services. We use a range of IBM Cloud solutions, which gives us access to data centers all over the world, so we can locate the cloud solutions near to where our customers are based, and transfer data locally from their sites to the cloud.”
He adds: “IBM Spectrum Scale provides high-performance data storage that we can scale quickly and easily. Built-in tiering capabilities allow a lot of flexibility in how we move data around, enabling customers to seamlessly migrate data from lab instruments up to the cloud for analysis and long-term storage.
“IBM Spectrum LSF, meanwhile, offers everything we need for HPC workload management in a single package, from job scheduling tools to resource management capabilities. It gives us the tools to manage the Lab7 Genomic Cloud as a complete HPC environment rather than just as a virtual machine and associated storage layer, providing intelligent, policy-driven scheduling and improved visibility to increase throughput.”
The Lab7 software sits on top of the IBM Spectrum Computing software and provides domain modeling tools for managing the life science and genomic applications, including laboratory management software. The Lab7 Genomic Cloud also harnesses a full suite of open source tools for genomic analysis, and Lab7 provides domain support as a service to ensure everything runs smoothly for customers.
Lab7 collaborated closely with IBM to get the new solution up and running in a very short timeframe.
“Creating an entire HPC stack was surprisingly easy—the IBM team was just incredible in putting together the solution for us,” notes Chris Mueller. “We were able to go from the conceptual design to having a working solution up and running in only two weeks. We were pretty amazed at how fast the development process was, and how stable the solution was once we got it going.”
Speeding up scientific research
By combining Lab7's unique expertise in the bioinformatics space with IBM's HPC experience, the joint team successfully developed a genomics-specific data management solution that offers cost-effectiveness, flexibility, simplicity, and scalability.
Chris Mueller states: “Based on our basic cost competitiveness analysis, our solution works out at about a third of the price of using commodity solutions to perform the same work at scale. This means that scientists can spend less money on IT, and focus more on research.
“In addition, Spectrum Scale provides users with a great deal of flexibility in where, when and how they choose to store and use data. A significant benefit which enables collaboration is the global namespace which allows everyone around the world to access the data when they need it and extract a lot more scientific information out of it. The solution can also handle high volumes of unstructured data and demonstrate performance benefits of parallel access to data with no bottlenecks, which is vital for data-intensive genomics research.”
The Lab7 Genomic Cloud is also very user-friendly, as Chris Mueller explains: “For a customer that's already using IBM Spectrum Computing solutions, this is a very natural transition. It's simply a matter of moving your data center over to the IBM Cloud, and then you can take advantage of all the other services we offer as part of the solution. And as a typical HPC environment—albeit in the cloud—users are already familiar with how it works. We’re offering customers the familiar HPC experience, but as a more scalable, more cost-effective cloud service.”
Furthermore, the solution offers superior performance and speed.
“By working with IBM, we have access to a very broad range of services that we can provide to customers of this solution—far beyond what a commodity cloud can offer,” says Chris Mueller. “With IBM Spectrum Scale, IBM Spectrum LSF, and targeted software optimizations, we were able to take a standard genomics workflow and dramatically increase performance.
“For example, we were able to cut the runtime of one standard genome analysis pipeline down from 24 hours to just over an hour—a time saving of 96 percent. This hugely accelerates the speed and efficiency with which users can process DNA samples, which will help to shorten time-to-insight in research projects.
“Looking to the future, we want to take greater advantage of being part of the IBM Cloud ecosystem by providing more of its services and analytics tools to our customers via the Lab7 Genomic Cloud.”
Chris Mueller concludes: “The combination of our bioinformatics expertise with IBM’s technical resources and experience was vital in creating the Lab7 Genomic Cloud, which will enable scientists to work faster and more efficiently—helping to advance our understanding of genomics.”
About L7 Informatics
Founded in 2012 and headquartered in Austin, Texas, L7 Informatics (previously Lab7 Systems) is a technology company focused on the development of a comprehensive, sample-to-answer workflow management software platform for data-intensive science. Its Enterprise Science Platform (ESP) offers a solution for data managers, bioinformaticians, and scientists who are struggling with the disjointed set of tools currently available to handle the increasing informatics bottleneck that arises from the broader adoption of data-rich informatics-dependent technologies. Lab7 is an IBM Business Partner.
Take the Next Step
To learn more about IBM Spectrum Computing, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/systems/spectrum-computing
Please read more about the Lab7 solution here: lab7.io/test/wp-content/uploads/2017/06/KUS12392-USEN-00-Lab7-Cloud-PoC-Brief.pdf