Cloud Computing

Building an open seamless science cloud

Share this post:

A Q&A with Ezra Silvera

Ezra Silvera is an IBM researcher who has been tackling the challenge of massive scale data in the cloud’s virtual machines for over a decade. In plain English, he tries to get groups of computers to work together as one virtual machine, and process petabytes of data, without anyone knowing these computers are actually sitting continents apart.

Recently, Ezra presented IBM’s vision for the design of the Helix Nebula Science Cloud. As one of the four finalists in this effort, IBM’s vision for a sustainable science cloud must be capable of serving Europe’s biggest research centers, including CERN, EMBL, ESA, and PIC.

When did you start getting interested in cloud technologies?

Ezra Silvera: For the last 12 years, my research has been focused on areas related to system management and virtualization, including network storage and cloud computing. I’ve been heavily involved in projects around OpenStack and more recently on containers and Docker.

Ezra Silvera

Ezra Silvera, Staff Member at IBM Research – Haifa

Tell us about the Helix Nebula Science Cloud

EZ: Helix Nebula is a new, pioneering partnership between leading IT providers and some of Europe’s biggest research centers, CERN, EMBL, ESA and PIC, to chart a course towards sustainable cloud services for the research communities. This effort is known as the European Open Science Cloud (EOSC) initiative. The vision of the EOSC is to offer Europe’s 1.7 million researchers and 70 million science and technology professionals a virtual environment with open and seamless services for storage, management, analysis, and re-use of research

data across borders and scientific disciplines free at the point of use.

The scientists in CERN and similar research organizations need an infrastructure that is built specifically for their calculations, with computers connected for high performance. Their experiments run jobs that can take months or days, processing vastly enormous amounts of data.

For example, the CERN particle accelerator generates more than 25 petabytes (25,000,000,000,000,000 bytes or 1,000 terabytes) of new data per year. The data is then distributed to more than 100 data centers across the globe for further analysis.

Personally, I find it fascinating to work on super-challenging problems together with real client needs. The fact that these clients are top EU research institutes makes the project especially interesting.

What unique requirements make this science cloud so challenging?

EZ: These research institutions want to move to an infrastructure that takes advantage of new cloud trends suc

Helix Nebula Science Cloud

h as the ‘pay as you use’ model, high performance computing in the cloud, and the ability to elastically use unlimited resources as needed by the data.

The data being used by these organizations is varied and massive,  touching areas like the human genome, astrophysics, physics and more. It’s not just a matter of opening more VMs in the cloud; it involves creating a system that can be reconfigured on the fly to meet different needs, as the computing jobs change.

The Helix Nebula Science Cloud defined four challenge areas:

  • Transparent data access means that running jobs on virtual machines in the cloud should ‘feel’ as though they are running in the data center on premise.
  • Innovative pricing model will need to handle new approaches like spot instances, resource auctions, or scheduling certain jobs when prices are lower
  • Dedicated communication lines for the research institutes
  • Identity management so different scientists in the various organizations will maintain their accounts and access privileges, just it will be on the cloud

In some ways, this is really the pinnacle of cloud and information technology. I consider it a privilege to participate in this initiative.

More stories

Four Papers Advance Computational Argumentation in IBM’s Project Debater

The latest work on computational argumentation from the IBM Project Debater research team group is being presented at the ACL 2019 conference. Three papers will be presented at the main conference and one more paper will be presented in the co-located Argument Mining Workshop.

Continue reading

Hello, OpenAPI-to-GraphQL 1.0.0

IBM cloud researchers released version 1.0.0 of OpenAPI-to-GraphQL, a library to auto-generate GraphQL wrappers for existing REST(-like) APIs. In contrast to other libraries, OASGraph is data-centric, understands swaggers and Open API Specification (OpenAPI 3.0.0) files, sanitizes / de-sanitizes parts of REST APIs not compatible with GraphQL, and makes use of OpenAPI 3.0.0 features like links to generate more usable GraphQL interfaces.

Continue reading

Label Set Operations (LaSO) Networks for Multi-Label Few-Shot Learning

Data augmentation is one of the leading methods to tackle the problem of few-shot learning, but current synthesis approaches only address the scenario of a single label per image, when in reality real life images may contain multiple objects. The IBM team came up with a novel technique for synthesizing samples with multiple labels.

Continue reading