To advance our understanding of the human genome, scientists must process vast amounts of data. However, many research centers struggle to cope with volume of data they generate. How could L7 help?
L7 teamed up with IBM to build a HPC environment in the cloud, leveraging IBM® Spectrum™ technology for flexible, highly scalable data storage and user-friendly workload management.
96%reduction in the runtime of a standard genome analysis pipeline
1/3the price of using commodity solutions to perform the same work at scale
2 weeksfrom conceptual design to fully-functional IBM HPC environment in the cloud
Business challenge story
Delving deeper into the human genome
Genomics, the study of an organism’s complete set of DNA, is a crucial area of scientific research. By studying the human genome and gaining a better understanding of our DNA, scientists hope to develop new ways to diagnose, treat, cure and even prevent many diseases.
To gain new insights, scientists must process and analyze enormous volumes of genomics data—which many research centers and institutes struggle to keep on top of.
Chris Mueller, Founder of L7 Informatics, takes up the story: “One of the biggest challenges genomics research presents is the sheer volume of data that's generated—a single instrument run can produce over 100 GB of data. Transferring so much data from lab equipment to short and long-term storage can be difficult to manage.
“Another significant challenge is making data available to scientists in a timely manner. We work with many sequencing core facilities that pre-process DNA samples in preparation for examination by scientists. We wanted to make it quicker and easier for facilities to deliver data to scientists. After all, the more seamless and efficient this process is, the faster scientists can get on with their research.”
Taking genomics research to the cloud
The partnership between L7 and IBM enabled the development of an end-to-end high-performance Genomic Cloud that would enable fast, effective management and analysis of genomics data at scale. The Genomic Cloud can support up to 500 compute nodes, with more than 10 petabytes of storage, helping achieve faster results and optimized resource usage for HPC applications.
Chris Mueller comments: “Most of our customers currently have their own on-premises infrastructure to store and analyze the data generated by their lab equipment. We set out to develop a more flexible, scalable alternative: HPC in the cloud. Essentially, we wanted to offer our customers a very familiar computing environment, but delivered as a cloud service.”
Chris Mueller says: “Working as a team, we were able to combine all of our existing software assets and bioinformatics expertise with IBM’s extensive hardware and cloud services. We use a range of IBM Cloud solutions, which gives us access to data centers all over the world, so we can locate the cloud solutions near to where our customers are based, and transfer data locally from their sites to the cloud.”
He adds: “IBM Spectrum Scale provides high-performance data storage that we can scale quickly and easily. Built-in tiering capabilities allow a lot of flexibility in how we move data around, enabling customers to seamlessly migrate data from lab instruments up to the cloud for analysis and long-term storage.
“IBM Spectrum LSF, meanwhile, offers everything we need for HPC workload management in a single package, from job scheduling tools to resource management capabilities. It gives us the tools to manage the L7 Genomic Cloud as a complete HPC environment rather than just as a virtual machine and associated storage layer, providing intelligent, policy-driven scheduling and improved visibility to increase throughput.”
The L7 software sits on top of the IBM Spectrum Computing software and provides domain modeling tools for managing the life science and genomic applications, including laboratory management software. The L7 Genomic Cloud also harnesses a full suite of open source tools for genomic analysis, and L7 provides domain support as a service to ensure everything runs smoothly for customers.
L7 collaborated closely with IBM to get the new solution up and running in a very short timeframe.
“Creating an entire HPC stack was surprisingly easy—the IBM team was just incredible in putting together the solution for us,” notes Chris Mueller. “We were able to go from the conceptual design to having a working solution up and running in only two weeks. We were pretty amazed at how fast the development process was, and how stable the solution was once we got it going.”
Speeding up scientific research
By combining L7's unique expertise in the bioinformatics space with IBM's HPC experience, the joint team successfully developed a genomics-specific data management solution that offers cost-effectiveness, flexibility, simplicity, and scalability.
Chris Mueller states: “Based on our basic cost competitiveness analysis, our solution works out at about a third of the price of using commodity solutions to perform the same work at scale. This means that scientists can spend less money on IT, and focus more on research.
“In addition, Spectrum Scale provides users with a great deal of flexibility in where, when and how they choose to store and use data. A significant benefit which enables collaboration is the global namespace which allows everyone around the world to access the data when they need it and extract a lot more scientific information out of it. The solution can also handle high volumes of unstructured data and demonstrate performance benefits of parallel access to data with no bottlenecks, which is vital for data-intensive genomics research.
The L7 Genomic Cloud is also very user-friendly, as Chris Mueller explains: “For a customer that's already using IBM Spectrum Computing solutions, this is a very natural transition. It's simply a matter of moving your data center over to the IBM Cloud, and then you can take advantage of all the other services we offer as part of the solution. And as a typical HPC environment—albeit in the cloud—users are already familiar with how it works. We’re offering customers the familiar HPC experience, but as a more scalable, more cost-effective cloud service.”
Furthermore, the solution offers superior performance and speed.
“By working with IBM, we have access to a very broad range of services that we can provide to customers of this solution—far beyond what a commodity cloud can offer,” says Chris Mueller. “With IBM Spectrum Scale, IBM Spectrum LSF, and targeted software optimizations, we were able to take a standard genomics workflow and dramatically increase performance.
“For example, we were able to cut the runtime of one standard genome analysis pipeline down from 24 hours to just over an hour—a time saving of 96 percent. This hugely accelerates the speed and efficiency with which users can process DNA samples, which will help to shorten time-to-insight in research projects.
“Looking to the future, we want to take greater advantage of being part of the IBM Cloud ecosystem by providing more of its services and analytics tools to our customers via the L7 Genomic Cloud.”
Chris Mueller concludes: “The combination of our bioinformatics expertise with IBM’s technical resources and experience was vital in creating the L7 Genomic Cloud, which will enable scientists to work faster and more efficiently—helping to advance our understanding of genomics.”
About L7 Informatics
L7 Informatics provides software and services that enable synchronized solutions for science and health. L7’s novel Enterprise Science Platform (ESP) is a scientific process and data management (SPDM) solution that enables life science and healthcare companies to connect people, processes, and systems to accelerate discoveries and drive precision healthcare. If you are interested in learning more about L7 solutions, check out this Genomic Cloud White Paper.