Mark III Systems solves a unique biomedical data challenge
Biomedical research often relies on large-scale data collection. Researchers handling genomic and other biological data don’t think in terms of gigabytes or terabytes, but in petabytes (PB). A single PB contains around 500 billion pages of standard printed text. Just transferring information at this scale, let alone storing and processing it, is a huge undertaking.
Data volumes in this area show no sign of slowing. In fact, research institutions are creating more data every day as their processing capabilities improve. They need new solutions to store and share it as they strive to find new treatments. Mark III, an IBM Platinum Business Partner since 1995*, achieved the IBM 2020 Outstanding Storage Systems Solution Beacon Award for its work helping one university to tackle this problem.
Mark III’s client was a prominent university medical center with a history of combining biomedical research and clinical care. It prided itself on its ability to drive science-driven research into new clinical treatments that could benefit 105,000 hospitalized patients, 370,000 emergency room cases and three million outpatient visits each year.
The data-intensive computing work that supported its research helped to save lives. That compelled it to streamline and accelerate its research as much as possible.
The research applications and data architectures involved in research-focused IT were so demanding that the university’s conventional IT department couldn’t keep up. It created a specialist biomedical high-performance computing (HPC) department containing specialists with expertise in both biomedical research and IT. It took over the job of data and infrastructure management from the IT division.
The new department inherited a huge problem: a chaotic mix of fragmented legacy systems. Research systems had developed organically across over a dozen separate departments, creating over 20 data silos spanning various disciplines including genomics, oncology, radiology, emerging treatments, and general-purpose research.
“They were entirely independent,” recalls Stan Wysocki, President of Mark III Systems. “In some cases, they were workstations sitting under people’s desks.”
This made it difficult to share data across departments and initiatives, locking up much of its inherent value. It also left duplicate data in different silos.
The HPC department had to nurture these silos, configuring hardware and software to support independent departmental standards that were incompatible with everyone else’s. That sapped time and energy that could have been better spent elsewhere, slowing down critical research processes.
“A lot of the research projects would take days, weeks, or even months to put together,” Wysocki says. “The department would have to provision the systems and the storage. Then they would turn those resources over to the researchers.”
There were already 2PB of data spread around these silos, and the problem was destined to get worse as more departments enlisted the HPC department’s services. The number of data silos that it had to manage was increasing along with the demand for the GPU-enabled systems that could manage AI workloads. That threatened to drain more money from research initiatives as it struggled to manage this infrastructure.
“The big challenge for them was around supporting additional researchers by managing data as the volume of information grew,” Wysocki says. The HPC department also needed a way to share the growing volume of data between different research disciplines.
The obvious answer was a data lake that could scale to meet the considerable demands of multiple research departments with their own HPC clusters. In response, Mark III systems developed its Mark III Systems High Performance Computing as a Service (HPCaaS) solution.
Research data as a service
HPaaS uses IBM’s Elastic Storage Server (ESS) and Spectrum Scale Data Management Edition (DME) products, pulling together all the disparate data into a unified repository.
Mark III’s storage solution is modular and expandable. It began with four petabytes of usable storage but expanded to 15PB. It supports the HPC department’s plans to reach 20PB or more in 2020-2021.
With the raw storage taken care of, Mark III still needed to facilitate access. It wrote extra code using the Red Hat OpenShift container platform and Ansible, allowing researchers from different departments to work simultaneously on that data using their own computing platform.
The company based its system on a DevOps methodology that would enable research departments to stand up research data environments quickly using the data lake without contacting the HPC department.
Keeping things secure
Security was another important part of this DevOps methodology. Some of the data was sensitive personal information, and the university’s security team needed assurance that it was properly protected. In response, Mark III built a framework called SecureESS that constantly checked the data lake for security and compliance issues.
“They love this product because they have visibility through the wrapper that our team wrote,” Wysocki says. “It gives them visibility into who’s using the data and they can report back that it’s secure.”
Hybrid cloud technology was a key part of this solution. It uses IBM TCT (Transparent Cloud Tiering) provide secure location-independent access to the research data sets.
A faster, more flexible research system
The result was a more fluid research process in which researchers could set up their own projects.
“Now, you go into their workspace, click on what it is that you want to build from a science perspective, upload the data that you need to upload or link to the data that’s already there on that storage system, and you run your job,” Wysocki explains. “It’s really a research data as a service model.”
This new way of working relies on a dramatic increase in performance. The university has 24GB/sec of high-speed throughput over Infiniband network across its two sites, reducing the latency research jobs. This has contributed up to a 40% reduction in job time.
The unified storage system also helped reduce the time spent on data management by 30%, helping the university to reduce overall operating costs.
The system is already delivering real-world results. A data scientist from Mark III participated in a medical hackathon. It used AI-powered computer vision to reduce the time that children needed to spend under anesthetic in an MRI machine for heart surgery. This can help reduce the effects of anesthesia on child brain development and enables doctors to handle more patients.
Building business through a partnership with IBM
This is the kind of project that would have been difficult to accomplish without IBM’s technology, explains Wysocki. “The idea from the CSS was really around performance and scale. It’s very hard to find storage that will let you keep that much data and be able to retrieve and write to it that quickly.”
The project drove USD 1.2 million in IBM storage and software-defined storage revenues. Mark III’s modular design also made the project repeatable, enabling other clients to configure HPCaaS to meet their own needs. The extra value offered by the HPCaaS framework gives Mark III and IBM an advantage over competitors who compete purely on price. This has opened up multiple opportunities, creating a USD 5 million revenue pipeline from HPCaaS in 2020.
Thanks to IBM’s unique storage technology, Mark III can help its university clients cope not only with the flood of data already at their disposal, but with the deluge yet to come. That can translate into real health outcomes that can help save countless lives in the future.
Learn more about how other Beacon Award winners are changing the world through their solutions.
*Mark III Partners. Accessed 25 September 2020. https://www.markiiisys.com/partners/