IBM all flash storage and data management solutions for AI

By | 4 minute read | August 1, 2018

AI Whales, AI Data Management, Multicloud, NVMe-oF, IBM Cloud Private, Recycling Plastic, Bodhi Healthcare, SUSE OpenStack, Ubuntu LinuxONE, LinuxONE

AI is impacting every industry, and “by 2020, 30 percent of organizations that fail to apply AI will not be operationally and economically viable”. The ecosystem of AI, machine learning (ML) and deep learning (DL) frameworks and patterns continues to mature, which drives enterprise adoption through supported software platforms such as IBM PowerAI Enterprise.  As the number of AI projects move from experimentation to multi-model production, the infrastructure requirements place increased performance and data management demands on the storage systems that support AI/ML/DL.

AI/ML/DL development is data intensive. As overviewed in the IBM Systems Reference Architecture for AI Infrastructure, the multi-step workflow uses historical data sets iteratively to predict results on new data. Once established, the data sets grow exponentially in value and use. As data science teams become proficient with a growing set of training data, more opportunities are generated for adoption of AI in the organization. This can occur in multiple ways. It can be done using the models on new data, or by creating new AI/ML/DL applications trained by the current data sets, by integrating AI components into other applications, or all of the above.

data flow graphic 1, AI Data Management

As organizations look to move AI/ML/DL into production, they need to consider how these data access patterns evolve and how the choice of storage affects data scientists and user experiences. The speed of accessing and manipulating this data becomes critical, and low latency, all-flash storage becomes paramount in several phases of AI adoption and production.

data flow graphic 2, AI Data Management

The need for all-flash is highest in the inference step. Inference is the production delivery of the AI application or APIs on new data coming in. Under common AI use cases, like chat bots, recommendation engines, audio, and visual recognition, AI responsiveness is critical to the experience.

Under training, the majority of data access is large file data transfers that may be locally cached in the compute and GPU servers. However, as the number of models and users grow, serialized large file access becomes randomized data access. High-capacity, all-flash systems are critical as organizations expand their development of AI/ML/DL and move to a shared data service with multiple models and teams that need access to data sets. High-performing, high-capacity storage with advanced functionality is a necessity in eliminating the overhead of redundant copies and excessive data movement resulting in using more relevant and less stale data.

Depending upon the workload, data ingest will also benefit greatly from choosing dedicated all-flash storage and in some cases, distributed data stores that scale horizontally are the preferred choice. For example, IBM Cloud Object Storage can handle oceans of sensor data that can be readily managed, manipulated, analyzed, reduced and extracted. In contrast, for real-time data streaming into a time series database, dedicated all-flash storage is an excellent choice.

The IBM Storage portfolio of all-flash storage and solutions serves the needs of small to massive AI/ML/DL projects. The most recent addition to the portfolio is IBM FlashSystem 9100, a transformative storage solution with extreme performance, density and software-enabled features. In a 2U enclosure it can pack the equivalent of 2 PBs of data, over a million IOPs and 34GB/s of throughput, at 100 microseconds of latency. This system can tackle the most demanding requirements with the latest NVMe technology enabled by advanced Spectrum Storage software.

IBM offers solutions to streamline and secure data access by built upon the FlashSystem 9100. The first of these is IBM Cloud Private to manage provisioning of persistent storage and containers. The other is IBM Cloud Private for Data that provides an abstraction and control layer to manage and secure data with access from behind the firewall to the cloud. These IBM solutions, powered by FlashSystem 9100, can deliver the essential private cloud service fabric for building and managing on-premises containerized applications, and can accelerate your journey to AI by simplifying hybrid data management, data governance and business analytics with a single interface. Organizational leaders can rapidly discover insights from their core business data while keeping it in a protected, controlled environment.

IBM FlashSystem® 9100 combines the performance of flash and Non-Volatile Memory Express (NVMe) with the reliability and innovation of IBM FlashCore® technology and the rich features of IBM Spectrum Virtualize™ in one powerful storage platform. You can choose IBM FlashCore Modules (FCM) with the line-speed internal performance, multi-dimensional data protection and innovative flash management features provided by FlashCore technology or industry-standard NVMe flash drives. IBM Spectrum Virtualize offers industry-leading data services such as data reduction pools, FlashCopy management, data mobility and high-performance data encryption.  It can bring more than 440 different IBM and non-IBM storage systems under a single management model, creating one easily managed storage resource. Data can now be encrypted, along with many other services, whether existing systems natively offer these features or not.

Organizations are preparing their infrastructures for the growth of data and artificial intelligence. IBM PowerAI Enterprise, IBM Cloud Private for Data and IBM Storage enable the capabilities to become a leader in the AI transformation.