Meeting the data needs of artificial intelligence

Share this post:

Artificial intelligence (AI) is playing an increasingly critical role in business. By 2020, 30 percent of organizations that fail to apply AI will not be operationally and economically viable, according to one report[1].  And in a survey, 91 percent of infrastructure and operations leaders cite “data” as a main inhibitor of AI initiatives[2]. What does a data professional need to know about AI and its data requirements in order to support his or her organization’s AI efforts?

Many factors have converged in recent years to make AI viable, including the growth of processing power and advances in AI techniques, notably in the area of deep learning (DL). Unlike traditional programming in which a programmer provides the computer with each step that it has to take to accomplish some task, deep learning requires the computer to learn for itself.  In the case of visual object recognition, for example, there is no way to program a computer with the steps needed to recognize a given object which may present itself in different locations, at different angles, in different lighting conditions, perhaps partially obscured by some other object and so forth.  Instead, the computer is trained by being given thousands of examples of images containing the object until it can consistently recognize it.

This kind of training requires lots of data. One recommendation is to start with at least 100,000 examples – and each example can be large: an image or a voice recording, for example. Different stages of training and deployment of a deep learning system have different data and processing requirements.  For the training stage, there may be years of data to process, and it can take weeks or even months to complete.   By contrast with these extended time frames, once deployed, the system may need to respond in seconds.

Obviously, given the data volumes involved, storage capacity is an important consideration during the training stage.  The data may also be in different formats in different systems, so multi-protocol capability may be needed. The data may also be geographically dispersed, an additional factor the storage system needs to handle. Once deployed, fast access to the data becomes particularly important to support the response requirements of users and applications, which typically need answers in seconds.

A system such as IBM Spectrum Scale is perfectly suited to meeting these requirements.  It is a high-performance system that can scale out to handle petabytes or exabytes of data.  It supports a wide variety of protocols for accessing files or objects. For Hadoop applications, it provides direct access to data without having to copy the data to HDFS, as is usually required. Avoiding the overhead of copying data between systems lowers costs by saving space and also speeds time to results.

IBM Spectrum Scale is a software-defined solution that can be deployed on a customer’s choice of platform, or it can be delivered as a complete solution in the form of IBM Elastic Storage Server (ESS).  The capacity and performance capabilities of IBM Spectrum Scale and ESS are well illustrated by the US Department of Energy CORAL project, currently on track to build the world’s fastest supercomputer.  ESS will be providing the 250PB of storage the system requires, with performance requirements that include 2.5 TB/second single stream IOR and the creation of 2.6 million 32K files per second.

IBM Spectrum Scale and IBM Elastic Storage Server undergo constant improvement.  The latest version of IBM Spectrum Scale incorporates enhancements to the install and upgrade process, the GUI, and system health capabilities, along with scalability and performance tuning for Transparent Cloud Tiering up to one billion files, and file audit logging enhancements.

Meanwhile, ESS now offers models incorporating the superior performance of IBM Spectrum Scale version 5.0 with performance improvements designed to meet the requirements of the CORAL supercomputer.  ESS is also bringing out its first hybrid models incorporating both flash and disk storage in a single unit, allowing improved handling of different kinds of data such as video and analytics within a single environment.

Constant improvements, along with decades of experience in the most challenging customer environments, ensure that IBM Spectrum Scale and IBM Elastic Storage Server will continue to lead the way in managing the data that is a key element in the success of any deep learning project. Visit our website to learn more about IBM Spectrum Scale and IBM Elastic Storage Server.

[1] Gartner Predicts 2018: Compute Infrastructure 

[2] Gartner AI State of The Market – and Where HPC intersects

Product Marketing Manager, IBM Software-Defined Storage

More Storage stories

IBM Spectrum LSF goes multicloud

IBM is moving swiftly to implement multicloud capabilities across both our IBM Spectrum Storage and IBM Spectrum Computing portfolios. In an important step for our high-performance computing (HPC) solutions, today we’re announcing the release of a deployment guide that facilitates the use of IBM Spectrum LSF Suite with Amazon Web Services (AWS). IBM Spectrum LSF […]

Continue reading

Modern data protection innovations to fight cyber threats

In 2018, the average cost of a data security breach approached $3.9 million. But organizations that fully deployed security automation saved over $1.5 million per breach.[1] These metrics demonstrate the value of effective data protection and security solutions, and underscore the significance of recent data security-related announcements from IBM. “ESG’s research clearly shows that data […]

Continue reading

Innovating Deployment for Critical Applications

These days, the business world is changing rapidly. Data-driven enterprises need IT solutions that are evolving and improving as quickly as the companies themselves. Recent announcements from IBM Storage demonstrate that IBM is continuing to provide leading-edge data management and storage innovation across the entire information lifecycle, from creation to archive. IBM Spectrum Scale, a […]

Continue reading