This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available. More details available on our FAQ. (Read in Japanese.)
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here is my recap of the sessions on the morning of Day 4.
Configurable IBM Spectrum Scale
Kent Koeninger presented IBM Spectrum Scale software, which Kent refers to as "Configurable Spectrum Scale" (or CSS for short), as opposed to the pre-built system known as Elastic Storage Server (ESS).
Why choose CSS versus ESS? Lower entry price. You can start with just two single-socket servers and a drawer of disk.
IBM Spectrum Scale was formerly called IBM General Parallel File System (GPFS). Many who tried earlier versions of GPFS found it difficult to configure, because it only had a command line interface. Now, Spectrum Scale has a fully-functional GUI, and clients have been able to install and configure Spectrum Scale in just 30 minutes!
How big can Spectrum Scale grow? As much as your budget can afford! With an architecture that can support YottaBytes of data and 900 quintillion files, you won't hit any limits anytime soon.
There are some unique capabilities of ESS not available in CSS. For example, ESS offers Spectrum Scale Native RAID (erasure coding) with fast rebuild times, and ESS is certified for SAP HANA. You can combine any combination of CSS and ESS in the same Spectrum Scale to create a "data lake" for mixed workloads.
A good use case for Spectrum Scale, either CSS or ESS, is backup. Kent explained why it is an excellent option to store backups with enterprise backup software such as IBM Spectrum Protect or Commvault.
VersaStack - Hybrid Cloud like no other
This session was jointly presented by Chris Vollmar, IBM Storage Architect, and Brent Anderson, Cisco Global Consulting Systems Engineer. IBM and Cisco have been partners for more than 25 years.
VersaStack combines Cisco UCS x86 servers, Cisco Nexus and MDS switches, and IBM FlashSystem or Spectrum Virtualize storage.
What if you have a SAN Infrastructure built entirely from IBM b-type or Brocade-based switches? Cisco supports their SAN switches for this, but nobody has tested VersaStack in this combination, and UCS Director does not manage this combination, so IBM does not support this. Instead, for this situation, IBM recommends doing external connection via Ethernet, or using direct-attach configurations.
The Cisco Validated Design spends four months testing, and gives you bulletproof process to deploy the solution.
There is a difference between Cisco UCS Manager and UCS Director. UCS Manager is available at no additional charge, but only manages the Cisco x86 servers. UCS Director is optionally extra priced, and manages Cisco servers, Cisco networking, and IBM Spectrum Virtualize storage.
Brent explained the benefits of UCS Management through policies and profiles.
Chris covered Cisco CloudCenter, which the Cisco team shortens to just "C3". IBM Spectrum Copy Data Management can be used to move snapshots of data between on-premises and off-premises Cloud to help in Hybrid Cloud configurations.
How to Design an IBM Spectrum Scale solution
Tomer Perry, IBM Spectrum Scale I/O Development, presented this session.
For those who want to bring up a quick IBM Spectrum Scale environment to play around with, you can do this in as little as 30 minutes. But to design a mission critical deployment, additional requirements may need to be addressed. You may need to consult with not just storage admins, but also application owners, network admins and security personnel.
Large companies have hundreds or thousands of applications, so Tomer recommends to group these into "Workload families", based on data set types, access patterns and performance requirements. For NAS take-out, 80 percent of NAS I/O is "get attribute" that can easily be served directly from cache memory.
For each workload family, you may need to decide on snapshots, quotas, namespace (bind mounts, symlinks, etc.), security (ACL, encryption), estimated capacity, replication BC/DR, backup and ILM requirements.
Unless this is completely greenfield deployment, the existing infrastructure needs to be evaluated. This includes the LAN and WAN network topology, name resolution (DNS), time services (NTP), Authentication (AD, LDAP, NIS, Keystone), Keyserver (IBM SKLM), Monitoring and Migration requirements.
Tomer suggests designing the environment in this order: Cluster, File System, Storage Pools, Fileset, Replication, and finally Monitoring.
Generally, you need three NSD servers per cluster. For those licensing Spectrum Scale Standard Edition by the socket, you may be tempted to put everything into one big cluster. The new capacity-based Spectrum Scale Data Management Edition eliminates that concern, so Tomer recommends having separate computer clusters and storage clusters, connected by cross-cluster mount. All nodes in a cluster are considered an "ssh" administration domain.
A single Spectrum Control namespace can support up to 256 file systems. There are various reasons to have multiple file systems: block size, backup/recovery, snapshot, quotas, and cross-cluster isolation. If a file system gets corrupted, it will not affect other file systems. In an internal test, an "fsck" on 1 billion, 1 PB of data file system took only 30 minutes to repair.
Storage Pool design can separate metadata from content, and workloads can be separated to different storage media. With ILM, HSM and TCT, you can move colder data to Cloud, Object Storage, Spectrum Protect or Spectrum Archive.
Filesets are tree branches for each file system. IBM Spectrum Scale supports both dependent and independent filesets. Filesets can be used for Non-erasable, Non-Rewriteable (NENR) Immutability, policies, quotas, snapshots. Consider using a fileset instead of carving off a new file system.
Spectrum Scale offers both synchronous and asynchronous replication. For Synchronous, the ReadReplicaPolicy can be set to default, local or fastest. For Asynchronous, there are a variety of AFM modes (Read-only, Local-Update, Single-Writer, Independent-Writer, and Disaster Recovery). You may need to decide if your AFM gateways are dedicated or collocated. You will need to tune your TCP buffers for WAN performance to get the RPO you desire.
The nice thing about IBM solutions is that you can start small, and grow big. In all of these examples above, IBM offers sizes to match nearly any IT budget.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here's my recap of the sessions of Day 3.
Ethernet-only SANs -- Myth or Reality?
Anuj Chandra, IBM Advisory Engineer, presented an excellent overview of Ethernet-based SANs. He started with a quick history of Ethernet, starting with Robert Metcalfe's original drawing for his concept.
In the past, Ethernet was used for email and message transfer, and so dropped packets were tolerated. However, with the use of Ethernet for SANs, many standards have been adopted to make Ethernet networks more robust. These meet requirements for Flow Control, Congestion management, low latency, data integrity and confidentiality, network isolation, and high availability.
These standards are known as IEEE 802.1Q "Data Center Bridging", including 8012.Qbb Priority Flow Control, 802.1Qaz Enhanced Transmission Selection, 802.1Qau Congestion Notification. There is also the IETF Transparent Interconnection of Lots of Links (TRILL) to replace Spanning Tree Protocol (STP). All of these features are negotiated between endpoints server and storage. Ethernet that supports these new standards is often referred to as "Converged Ethernet" since it handles both traditional email/message traffic as well as SAN data traffic.
In addition to 1GbE and 10GbE, we now have 2.5, 5, 20, 40, 50, 100 Gb Ethernet speeds. By 2020, Anuj estimates over half of all Ethernet ports will be 25 GbE or faster. Amazingly, some of these can work on existing 10BASE-T cables.
Anuj also covered Remote Direct Memory Access (RDMA), and the RDMA-capable Network Interface Cards (RNIC) that support them. In one chart, shown here, Anuj explained Infiniband, RDMA over Converged Ethernet (RoCE) and RoCE v2, and Internet Wide Area RDMA Protocol (iWARP).
While many of these enhancements were intended for Fibre Channel over Ethernet (FCoE), the beneficiary has been iSCSI. Now there is iSCSI Extensions for RDMA (iSER) to take even more advantage of these changes, and can work with Infiniband, RoCE or iWARP. All of these networks can also be used as the basis for NVMe over Fabric (NVMeOF).
Ethernet is the backbone of Cloud usage, and IBM is well positioned to take advantage of these new networking technologies.
Digital Video Surveillance solutions for extended video evidence protection
Dave Taylor, IBM Executive Architect for Software Defined Storage solutions, presented this session on Digital Video Surveillance (DVS).
Most video surveillance is either analog-based, going to standard VHS tapes, or file-based. Sadly, security guards that watch live camera feeds lose their attention span after 22 minutes.
There are an estimated 72 million cameras globally, with 1.5 million more every year.
City governments spend 57 percent of their budget on "public safety". This can include body cams for police departments. Taser International, now called AXON, dominates the body-cam market.
City budgets may not be prepared to store all of this video content into a cloud that complies with Criminal Justice Information Services (CJIS) standards. These Cloud services tend to be more expensive, as the videos must be treated as evidence, tamper-proof, and with appropriate chain of custody.
DVS is not just storing movies. IBM offers Intelligent Video Analytics. It is important to be able to derive insight and actionable response.
Storage capacity adds up quickly. Standard 1080p (1920 by 1080 pixel) camera generates 2.92 GB per hour, 70 GB per day, and over 2TB per month. If you have 1,000 cameras, that's over 2PB of data.
For xProtect servers running Windows, the Tiger Bridge Connector can be used to move the video files to either IBM Spectrum Scale or IBM Cloud Object Storage.
Deep Dive into HyperSwap for Active-Active applications and Disaster Recovery
Andrew Greenfield, IBM Global Engineer for Storage, explained the different ways HyperSwap is implemented across the IBM storage portfolio.
For IBM DS8000, HyperSwap is based on Metro Mirror synchronous replication. In the event that the primary DS8000 fails, the host server can automatically re-direct all I/O to the secondary DS8000. This is often referred to as "High Availability" (HA), and in some cases can serve as Disaster Recovery.
For IBM Spectrum Virtualize products, including SAN Volume Controller (SVC), FlashSystem V9000, Storwize V7000 and V5000 products, as well as Spectrum Virtualize sold as software, the implementation is different.
Previously, SVC offered Stretched Clusters, which put one node in one site, and a second node at another site, which allows for an Active/Active configuration. Unfortunately, the nodes in FlashSystem V9000 and Storwize are "connected at the hip", effectively bolted together, so putting separate nodes in different locations was not possible. To solve this, IBM developed HyperSwap that allows one node-pair to replicate across sites to another node-pair in the same Spectrum Virtualize cluster.
However, even though it is called "HyperSwap", it is not implemented in any way similar to the DS8000 method. Instead, Spectrum Virtualize uses the Global Mirror with Change Volumes to replicate data between sites.
IBM Storage and VMware Integration
This session was co-presented by Brian Sherman, IBM Distinguished Engineer, and Steve Solewin, IBM Corporate Solutions Architect.
For nearly two decades, IBM is a "Technology Alliance Partner" with VMware. To provide consistent integration to all the features and functions of VMware, IBM Spectrum Control Base Edition (SCBE) is provided at no additional charge for IBM DS8000, XIV, FlashSystem and Spectrum Virtualize products.
SCBE is downloadable as an RPM for RedHat Enterprise Linux (RHEL) can run bare-metal or as a VM.
For those using Hyper-Scale Manager, it will automatically install a special A-line-only version of SCBE. It will install SCBE, but it will only manage the A-line products (FlashSystem A9000, FlashSystem A9000R, XIV and Spectrum Accelerate).
Storage admins can define "storage services" that can be assigned to vCenter. This allows VMware admins to allocate storage in self-service mode.
After the meetings were over, IBM had a special event at the Universal City Walk to enjoy some drinks, food, and conversation, and to watch Blue Man Group.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here's my recap of the afternoon sessions of Day 2.
IBM Spectrum Protect deep dive into Container Storage Pools
Ron Henkhaus, IBM Certified Consulting IT Specialist, presented the new Spectrum Protect concept of "Container Pools" that can either be "Directory Pools" on SAN or NAS-based disk storage, or "Cloud Pools". Container pools can contain deduplicated and non-dedupe data.
Ron cautioned that directory pools should not be placed on the same file system as your Spectrum Protect database or logs. Also, best practice for any directory pool is to assign an "overflow" pool to any non-directory pool, such as disk, tape or cloud container.
Cloud pools can use either OpenStack Swift, V1 Swift, Amazon S3 protocol, Amazon Web Services, IBM Bluemix, and IBM Cloud Object Storage. You can pre-define the vaults and buckets in the configuration.
For off-premises Cloud pools, the data is encrypted by default. For other container pools, encryption is optional. Performance to Cloud pools have been improved by using "accelerator storage", basically a disk cache to collect data before sending over to the Cloud pool. Backups to Cloud pools can reach 8 TB per hour. Restore times varies from 500 to 1500 GB per hour.
Container Pools were designed for the new "Deduplication 2.0" feature introduced in version 7. Traditional Dedupe 1.0 to Device Class FILE is still available, but not recommended.
Version 7.1.6 changed the compression algorithm from LZW to LZ4. In all cases, Spectrum Protect performs these actions in this order: deduplication, compression, encryption. Data that is encrypted by the Spectrum Protect client is therefore not deduped.
The "Protect Storage Pool" command can replicate a directory pool to either a remote directory pool or Cloud pool. In addition to this remote replication, you can copy a directory pool to tape to offer air-gap protection against ransomware. Such tapes are considered part of the "Copy Container Pool". In the event of directory pool corruption, the data can be repaired from either replication or tape.
IBM Aspera can now be used for replication, using SSL and AES-128 bit encryption. If your latency is greater than 50 msec, and have more than 0.5 percent packet loss, Aspera might help. This is available for Linux on x86 platforms running v7.1.6 or higher.
For existing customers, IBM Spectrum Protect allows you to convert your FILE, VTL and TAPE device class pools to directory or Cloud pools.
Introduction to IBM Cloud Object Storage (powered by Cleversafe)
In 2015, IBM acquired Cleversafe, recognized as the #1 Object Storage vendor. Their flagship product was officially renamed to the IBM Cloud Object Storage System, which some abbreviate informally as IBM COS. IBM offers the IBM Cloud Object Storage System in three ways: as software, as pre-built systems, and as a cloud service on IBM Bluemix (formerly known as SoftLayer).
Since then, IBM has been busy integrating IBM COS into the rest of the storage portfolio. I explained how IBM COS can be used for all kinds of static-and-stable data, but not suited for frequently changed data, such as Virtual machines or Databases.
Object storage can be access via NFS or SMB NAS-protocols using a gateway product, like IBM Spectrum Scale, or those from third-party partners like Ctera, Avere, Nasuni or Panzura. It can also be used as an alternative to tape for backup copies, and is already supported by the major backup software like IBM Spectrum Protect, Commvault Simpana, or Veritas NetBackup.
While other cloud service providers have offered data storage in the cloud, this new offering also allows hybrid configurations with geographically dispersed erasure coding.
Unlike RAID which protects against the loss of one or two drives, erasure coding can protect against a larger number of concurrent failures. For example, using an Information Dispersal Algorithm (IDA) of "7+5", where seven pieces of data are encoded on twelve independent disks, the system can lose up to five disk drives without losing any data.
Combining this with Geographically Dispersed Configuration across three or more sites means that you can lose an entire data center, four of the twelve disks, and still have instant full access to all of your data from eight drives at the other locations. In the graphic, you see two on-premise data centers combined with a third location in IBM SoftLayer.
New Generation of Storage Tiering: Simpler Management, Lower Costs, and Improved Performance
With ever changing amounts of storage, it is hard to find metrics that are consistent year to year. Fortunately, we found I/O density as the metric to focus my efforts, armed with real data from Intelligent Information Lifecycle Management (IILM) studies done at various clients. From that, I was able to talk about storage tiering on three fronts:
Storage tiering between Flash and disk. IBM FlashSystem and IBM Easy Tier on DS8000 and Spectrum Virtualize family for hybrid Flash-and-disk configurations.
Storage tiering between disk, tape, and Cloud. HSM and Information Lifecycle Management (ILM) on Spectrum Scale, Elastic Storage Server (ESS), Spectrum Archive and IBM Cloud Object Storage System.
Storage tiering automation across your entire environment. IILM studies can help identify a target mix of Tier 0, Tier 1, Tier 2 and Tier 3 storage. IBM Spectrum Storage Suite and the Virtual Storage Center (VSC) can recommend or perform the movement of LUNs to more appropriate tiers, based on age and I/O density measurements.
It's hard to say what the correct sequence of presentations should be. Some thought it might have been better for my talk on IBM Cloud Object Storage System prior to Ron's talk on Cloud container pools, but perhaps hearing Ron first helped drive more interest to my session.
I have been involved with Business Continuity and Disaster Recovery my entire career at IBM System Storage. However, with new workloads like Hadoop analytics and new Hybrid Cloud deployments, I thought it would be good to provide a refresh.
The need for Business Continuity and Disaster Recovery has increased recently due to (a) climate change caused by human activity, (b) ransomware and other cyber attacks, and (c) disgruntled employees.
Back in 1983, a task force of IBM clients at a GUIDE conference developed "Seven Business Continuity Tiers for Disaster Recovery", which I refer to as "BC Tiers". I divided the presentation into three sections:
Backup and Restore: BC tiers 1 through 3 are based on backup and restore methodologies. I explained how to backup Hadoop analytics data, all of the various options for IBM Spectrum Protect software, and how to encrypt the tape data that gets sent off premises.
Rapid Data Recovery: BC tiers 4 and 5 reduce the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) with snapshots, database journal shadowing, and IBM Cloud Object Storage.
Continuous Operations: BC tiers 6 and 7 provide data replication mirroring across locations. I covered 2-site, 3-site and 4-site configurations.
IBM Spectrum Virtualize - How it works - Deep dive
Barry Whyte, IBM Master Inventor and ATS for Spectrum Virtualize, covered a variety of internal topics "under the hood" of Spectrum Virtualize. This covers the SAN Volume Controller (SVC), FlashSystem V9000, Storwize V7000 and V5000 products, as well as Spectrum Virtualize sold as software.
In version 7.7, IBM raised the limits. You can now have 10,000 virtual disks per cluster, rather than 2,048 per node-pair. Also, you can now have up to 512 compressed volumes per node-pair. With the new 5U-high 92-drive expansion drawers, Storwize V7000 can now support up to 3,040 drives, and Storwize V5030 can support up to 1,520 drives.
While each Spectrum Virtualize node has redundant components, the architecture is designed to handle entire node failure. The term "I/O Group" was created to refer to the node-pair of Spectrum Virtualize engines and the set of virtual disks it manages. This made sense when virtual disks were dedicated to a single node-pair. Now, virtual disks can be assigned to multiple node-pairs, dynamically adding or removing node-pairs as needed for each virtual disk.
However, even if you have a virtual disk assigned to multiple node-pairs, only one node-pair would manage its cache, causing all other node-pairs to coordinate I/O through the cache-owning node-pair. The other node-pairs are called "access I/O groups".
The architecture allows for linear scalability, double the number of nodes, and you double your performance. Some competitors use n-way caching across four or more nodes, and it is a semi-religious argument on the pros and cons of each approach. Barry feels the 2-way caching implemented by Spectrum Virtualize is the most effective and efficient for performance.
All of the nodes are connected over IP network, but there is one designated as a "config node", and one, often the same, as a "boss node".
A cluster can have up to three physical quorum disks (either drive or mDisk) and optionally up to five IP-based quorums. The IP-based is just a Java program that runs on any server or Cloud, provided it can respond within 80 msec.
Either IP-based or physical quorum can be used for "tie-breaking" a split-brain situations. In the event there is no "active" quorum, the administrator can now serve as the tie-breaker manually. Barry recommends for Storwize clusters, where physical quorum disks are attached to a single node-pair, that you have at least one IP-based quorum for tie-breaking.
However, only physical quorum can be used for T3 Recovery. T3 Recovery happens after power outages. All of the nodes update the quorum disk with critical information of all of the virtual mappings of blocks to volumes, and this is used when bringing up the nodes again.
To protect against one pool consuming all of the cache, Spectrum Virtualize will partition the cache, and prevent any one pool from consuming more than a certain percentage of the total cache. The percentage depends on the number of pools:
Number of Pools
Max percentage of any individual pool
5 or more
Barry explained how failover works in the event of node failure. There is voting involved, and the majority remains in the cluster. In the case of an even split, called a "split brain" situation, the quorum decides. Orphaned nodes in a node-pair go into write-through mode, since the cache is no longer mirrored.
The I/O forwarding layer has been split between upper and lower roles. The upper layer handles access I/O groups. The lower layer handles asymmetric access to drives, mDisks and arrays.
N-port ID Virtualization (NPIV) drastically improves multi-pathing. Perhaps one of the coolest improvements in awhile, NPIV allows us to assign "Virtual" WWPN to other ports. When an I/O sent to a single port fails, it retries one or more times again, then waits 30 seconds, and then invokes multi-pathing to find a completely different path to the data. With NPIV, when a port fails, its WWPN is re-assigned to a different port, so the retries are likely to be successful before having to wait 30 seconds!
Lastly, Barry covered the delicate art of Software upgrades. Software is rolled forward one node at a time, and the "cluster state" is maintained during this time.
Different presentations this week are at different technical levels. My session was meant to be an overview of the concepts of Business Continuity, independent of specific operating system platform, using specific IBM products to help illustrate specific examples. Barry's was a deep dive into a single product family.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here's my recap of the afternoon sessions of Day 1.
Storage Brand Opening Session - Craig Nelson
Craig Nelson, Brocade manager for IBM Field Sales Channel, indicated the network equipment is the bridge that brings servers and storage together.
The squeeze -- faster servers and Flash storage causes storage networking to become the bottleneck. Fibre Channel will remain the protocol of choice for the next decade.
"Speed is the net currency of Business" -- Marc Benioff, Salesforce CEO.
Craig drew an analogy. We have been focused on making hard disk drives faster, and then Flash changed the game. Likewise, car manufacturers have focused on making gas engines better, and then Tesla Motors introduces an electric car with insane performance. The early models actually had an "Insane Mode".
The new Gen6 models of IBM b-type SAN equipment will support 32Gbps and 128Gbps ports. That's Insane!
Later models of Tesla Motors offer a "Ludicrous Mode". For flash storage, it is NVMe. NVMe can get storage down to 20 microsecond latency. That's Ludicrous!
Craig put in a plug for two Brocade sessions: "BEWARE - The four potholes on your road to success when deploying flash storage" and "Tune up your storage network! Is it healthy enough for flash storage and next-gen server platforms?"
Storage Brand Opening Session - Clod Barrera
Clod Barrera, IBM Distinguished Engineer and Chief Technical Strategist, presenting storage industry trends.
IDC predicts data capacity to grow 60-80% CAGR. This would require 44 percent drop in $/GB per year to maintain flat budget. Unfortunately, flash media cost is only dropping 25-30 percent per year, and spinning disk only 19 percent per year.
Since storage media will not offset capacity growth, we need other technologies to compensate, including compression, deduplication, defensible disposal, and "cold" storage to tape or optical media.
The smallest persistent storage that IBM has been able to achieve is 12 atoms. Current disk technology is 1200 atoms. Since 1956, IBM and the rest of the IT industry have improved storage 9 orders of magnitude, and now there are only 2 orders of magnitude left.
Clod poked fun at the "Star Wars: Rogue One" movie, indicating that their idea of the future of storage was a huge tape library. See my December 2016 blog post [Has your data gone rogue?]
What does it take to storage information forever? Tape will certainly be around. IBM Zurich demonstrated a 220TB back in 2015 as proof of technology.
A good example of the need for long-term retention are US films. Of those from the silent era, over 90 percent are lost. Over half of the films prior to 1950 are lost. The silver nitrate film stock that the reels were made of have deteriorated. Now that more movies are made digitally, can we do better?
Clouds will move from 10GbE to 25GbE. No slow down for FC in datacenters. Flash storage and object storage are both growing quickly
Move over Software-Defined Storage, Converged and Hyperconverged systems, the new up-and-coming thing are "Composable Systems deployed in Pods" adjustable hourly by workload requirements.
To protect against Ransomware, use "air gap" protection, not on the same network as production workload.
New storage models are needed for Cognitive workloads. Clod put in a plug for Joe Dain's presentation "Introducing cognitive index and search for IBM Cloud Object Storage leveraging Watson"
Storage Brand Opening Session - Axel Koester
Axel Koester, IBM Storage Chief Technologist, presented more storage industry directions.
What will the world look like in 10 years. Today mostly procedural programming, with some statistical big data, and a bit of machine learning. In 10 years, it will be mostly statistical and machine learning, with very little procedural programming. Why? Because it is faster to train computers with Machine Learning, than to program procedurally.
Examples of machine learning are IBM Watson, Google AlphaGo, drive-AI. Axel would rather be a passenger in a machine-learned self-driving car, than a procedurally-programmed one.
Neural networks to interpret hand-written numbers. Welcome to "Unsupervised learning".
A subset of Machine Learning is Deep Learning, a major breakthrough in 2006. Deep Learning is a subset of Machine Learning that uses three or more layers of neural networks. For example, face recognition "deep learning" algorithms can also be used to detect defects through visual inspection of circuit boards.
How does this impact storage?
Procedural -- archive test cases used
Statistical -- store all data for parallel processing
Machine Learning - train sample data, then archive and re-train yearly. Driving 5 minutes = 4 TB of sensor data used for self-driving cars
For Neural processing, x86 CPU are suitable for prototyping. GPU co-processors better, efficient but uncommon. IBM has developed the "TrueNorth" chip does nothing by Neural - 4096 cores with only 70 mW of energy consumption. No clock, instead dendrites, synapses, axons and neurons.
Instead of "Build or Buy?" the new question is "Train or Buy?" Train with confidential data, or buy ready-to-run 100% pre-trained cognitive systems as a service.
AI Frameworks are available on Docker containers with Kubernetes with Persistent storage (Ubiquity) such as Spectrum Scale. These frameworks include DL4J, Chainer, Caffe, torch, theano, tensorflow.
NVMe -- NVM is local only, how to do HA and DR? There are three options:
DB asynchronous shadowing
DB mirroring over NVMeOF
Cluster file system replication of persistent data, such as IBM Spectrum Scale
Example car manufacturer with 50 SAP HANA in memory instances on 4 Spectrum Scale nodes. IBM achieved 50,000 new files per second. Most NAS systems do much less.
Faster media on smaller electronics Holmium atoms on Magnesium Oxide on silver base, resulting in "single atom storage." ATM needle tip magnetizes, measured with Tunnel Magneto-resistance. Unfortunately, reading the data causes it to lose its value, so it is not as persistent as the 12-atom method described by Clod earlier.
As the title suggests, I explained why there is so much interest in Software-Defined Storage in the IT industry, what software-defined storage is, and how to deploy these solutions in your existing infrastructure without the full rip-and-replace. I covered which IBM products are available as software, pre-built systems and/or Cloud services.