Cloud Computing

Building an open seamless science cloud

Share this post:

A Q&A with Ezra Silvera

Ezra Silvera is an IBM researcher who has been tackling the challenge of massive scale data in the cloud’s virtual machines for over a decade. In plain English, he tries to get groups of computers to work together as one virtual machine, and process petabytes of data, without anyone knowing these computers are actually sitting continents apart.

Recently, Ezra presented IBM’s vision for the design of the Helix Nebula Science Cloud. As one of the four finalists in this effort, IBM’s vision for a sustainable science cloud must be capable of serving Europe’s biggest research centers, including CERN, EMBL, ESA, and PIC.

When did you start getting interested in cloud technologies?

Ezra Silvera: For the last 12 years, my research has been focused on areas related to system management and virtualization, including network storage and cloud computing. I’ve been heavily involved in projects around OpenStack and more recently on containers and Docker.

Ezra Silvera

Ezra Silvera, Staff Member at IBM Research – Haifa

Tell us about the Helix Nebula Science Cloud

EZ: Helix Nebula is a new, pioneering partnership between leading IT providers and some of Europe’s biggest research centers, CERN, EMBL, ESA and PIC, to chart a course towards sustainable cloud services for the research communities. This effort is known as the European Open Science Cloud (EOSC) initiative. The vision of the EOSC is to offer Europe’s 1.7 million researchers and 70 million science and technology professionals a virtual environment with open and seamless services for storage, management, analysis, and re-use of research

data across borders and scientific disciplines free at the point of use.

The scientists in CERN and similar research organizations need an infrastructure that is built specifically for their calculations, with computers connected for high performance. Their experiments run jobs that can take months or days, processing vastly enormous amounts of data.

For example, the CERN particle accelerator generates more than 25 petabytes (25,000,000,000,000,000 bytes or 1,000 terabytes) of new data per year. The data is then distributed to more than 100 data centers across the globe for further analysis.

Personally, I find it fascinating to work on super-challenging problems together with real client needs. The fact that these clients are top EU research institutes makes the project especially interesting.

What unique requirements make this science cloud so challenging?

EZ: These research institutions want to move to an infrastructure that takes advantage of new cloud trends suc

Helix Nebula Science Cloud

h as the ‘pay as you use’ model, high performance computing in the cloud, and the ability to elastically use unlimited resources as needed by the data.

The data being used by these organizations is varied and massive,  touching areas like the human genome, astrophysics, physics and more. It’s not just a matter of opening more VMs in the cloud; it involves creating a system that can be reconfigured on the fly to meet different needs, as the computing jobs change.

The Helix Nebula Science Cloud defined four challenge areas:

  • Transparent data access means that running jobs on virtual machines in the cloud should ‘feel’ as though they are running in the data center on premise.
  • Innovative pricing model will need to handle new approaches like spot instances, resource auctions, or scheduling certain jobs when prices are lower
  • Dedicated communication lines for the research institutes
  • Identity management so different scientists in the various organizations will maintain their accounts and access privileges, just it will be on the cloud

In some ways, this is really the pinnacle of cloud and information technology. I consider it a privilege to participate in this initiative.

More stories

AI Models Predict Breast Cancer with Radiologist-Level Accuracy

Our team of IBM researchers published research in Radiology around a new AI model that can predict the development of malignant breast cancer in patients within the year, at rates comparable to human radiologists.

Continue reading

RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection

Deep neural networks have demonstrated good results for few-shot learning. However, very few works have investigated the problem of few-shot object detection. A team of IBM researchers developed a novel approach for Distance Metric Learning (DML).

Continue reading

Data-Driven Aquaculture Management

Aquaculture requires innovative solutions to address challenges including population growth, climate change and limited natural resources.

Continue reading