Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Systems Client Experience Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here's my recap of the afternoon sessions of Day 2.
IBM Spectrum Protect deep dive into Container Storage Pools
Ron Henkhaus, IBM Certified Consulting IT Specialist, presented the new Spectrum Protect concept of "Container Pools" that can either be "Directory Pools" on SAN or NAS-based disk storage, or "Cloud Pools". Container pools can contain deduplicated and non-dedupe data.
Ron cautioned that directory pools should not be placed on the same file system as your Spectrum Protect database or logs. Also, best practice for any directory pool is to assign an "overflow" pool to any non-directory pool, such as disk, tape or cloud container.
Cloud pools can use either OpenStack Swift, V1 Swift, Amazon S3 protocol, Amazon Web Services, IBM Bluemix, and IBM Cloud Object Storage. You can pre-define the vaults and buckets in the configuration.
For off-premises Cloud pools, the data is encrypted by default. For other container pools, encryption is optional. Performance to Cloud pools have been improved by using "accelerator storage", basically a disk cache to collect data before sending over to the Cloud pool. Backups to Cloud pools can reach 8 TB per hour. Restore times varies from 500 to 1500 GB per hour.
Container Pools were designed for the new "Deduplication 2.0" feature introduced in version 7. Traditional Dedupe 1.0 to Device Class FILE is still available, but not recommended.
Version 7.1.6 changed the compression algorithm from LZW to LZ4. In all cases, Spectrum Protect performs these actions in this order: deduplication, compression, encryption. Data that is encrypted by the Spectrum Protect client is therefore not deduped.
The "Protect Storage Pool" command can replicate a directory pool to either a remote directory pool or Cloud pool. In addition to this remote replication, you can copy a directory pool to tape to offer air-gap protection against ransomware. Such tapes are considered part of the "Copy Container Pool". In the event of directory pool corruption, the data can be repaired from either replication or tape.
IBM Aspera can now be used for replication, using SSL and AES-128 bit encryption. If your latency is greater than 50 msec, and have more than 0.5 percent packet loss, Aspera might help. This is available for Linux on x86 platforms running v7.1.6 or higher.
For existing customers, IBM Spectrum Protect allows you to convert your FILE, VTL and TAPE device class pools to directory or Cloud pools.
Introduction to IBM Cloud Object Storage (powered by Cleversafe)
In 2015, IBM acquired Cleversafe, recognized as the #1 Object Storage vendor. Their flagship product was officially renamed to the IBM Cloud Object Storage System, which some abbreviate informally as IBM COS. IBM offers the IBM Cloud Object Storage System in three ways: as software, as pre-built systems, and as a cloud service on IBM Bluemix (formerly known as SoftLayer).
Since then, IBM has been busy integrating IBM COS into the rest of the storage portfolio. I explained how IBM COS can be used for all kinds of static-and-stable data, but not suited for frequently changed data, such as Virtual machines or Databases.
Object storage can be access via NFS or SMB NAS-protocols using a gateway product, like IBM Spectrum Scale, or those from third-party partners like Ctera, Avere, Nasuni or Panzura. It can also be used as an alternative to tape for backup copies, and is already supported by the major backup software like IBM Spectrum Protect, Commvault Simpana, or Veritas NetBackup.
While other cloud service providers have offered data storage in the cloud, this new offering also allows hybrid configurations with geographically dispersed erasure coding.
Unlike RAID which protects against the loss of one or two drives, erasure coding can protect against a larger number of concurrent failures. For example, using an Information Dispersal Algorithm (IDA) of "7+5", where seven pieces of data are encoded on twelve independent disks, the system can lose up to five disk drives without losing any data.
Combining this with Geographically Dispersed Configuration across three or more sites means that you can lose an entire data center, four of the twelve disks, and still have instant full access to all of your data from eight drives at the other locations. In the graphic, you see two on-premise data centers combined with a third location in IBM SoftLayer.
New Generation of Storage Tiering: Simpler Management, Lower Costs, and Improved Performance
With ever changing amounts of storage, it is hard to find metrics that are consistent year to year. Fortunately, we found I/O density as the metric to focus my efforts, armed with real data from Intelligent Information Lifecycle Management (IILM) studies done at various clients. From that, I was able to talk about storage tiering on three fronts:
Storage tiering between Flash and disk. IBM FlashSystem and IBM Easy Tier on DS8000 and Spectrum Virtualize family for hybrid Flash-and-disk configurations.
Storage tiering between disk, tape, and Cloud. HSM and Information Lifecycle Management (ILM) on Spectrum Scale, Elastic Storage Server (ESS), Spectrum Archive and IBM Cloud Object Storage System.
Storage tiering automation across your entire environment. IILM studies can help identify a target mix of Tier 0, Tier 1, Tier 2 and Tier 3 storage. IBM Spectrum Storage Suite and the Virtual Storage Center (VSC) can recommend or perform the movement of LUNs to more appropriate tiers, based on age and I/O density measurements.
It's hard to say what the correct sequence of presentations should be. Some thought it might have been better for my talk on IBM Cloud Object Storage System prior to Ron's talk on Cloud container pools, but perhaps hearing Ron first helped drive more interest to my session.
I have been involved with Business Continuity and Disaster Recovery my entire career at IBM System Storage. However, with new workloads like Hadoop analytics and new Hybrid Cloud deployments, I thought it would be good to provide a refresh.
The need for Business Continuity and Disaster Recovery has increased recently due to (a) climate change caused by human activity, (b) ransomware and other cyber attacks, and (c) disgruntled employees.
Back in 1983, a task force of IBM clients at a GUIDE conference developed "Seven Business Continuity Tiers for Disaster Recovery", which I refer to as "BC Tiers". I divided the presentation into three sections:
Backup and Restore: BC tiers 1 through 3 are based on backup and restore methodologies. I explained how to backup Hadoop analytics data, all of the various options for IBM Spectrum Protect software, and how to encrypt the tape data that gets sent off premises.
Rapid Data Recovery: BC tiers 4 and 5 reduce the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) with snapshots, database journal shadowing, and IBM Cloud Object Storage.
Continuous Operations: BC tiers 6 and 7 provide data replication mirroring across locations. I covered 2-site, 3-site and 4-site configurations.
IBM Spectrum Virtualize - How it works - Deep dive
Barry Whyte, IBM Master Inventor and ATS for Spectrum Virtualize, covered a variety of internal topics "under the hood" of Spectrum Virtualize. This covers the SAN Volume Controller (SVC), FlashSystem V9000, Storwize V7000 and V5000 products, as well as Spectrum Virtualize sold as software.
In version 7.7, IBM raised the limits. You can now have 10,000 virtual disks per cluster, rather than 2,048 per node-pair. Also, you can now have up to 512 compressed volumes per node-pair. With the new 5U-high 92-drive expansion drawers, Storwize V7000 can now support up to 3,040 drives, and Storwize V5030 can support up to 1,520 drives.
While each Spectrum Virtualize node has redundant components, the architecture is designed to handle entire node failure. The term "I/O Group" was created to refer to the node-pair of Spectrum Virtualize engines and the set of virtual disks it manages. This made sense when virtual disks were dedicated to a single node-pair. Now, virtual disks can be assigned to multiple node-pairs, dynamically adding or removing node-pairs as needed for each virtual disk.
However, even if you have a virtual disk assigned to multiple node-pairs, only one node-pair would manage its cache, causing all other node-pairs to coordinate I/O through the cache-owning node-pair. The other node-pairs are called "access I/O groups".
The architecture allows for linear scalability, double the number of nodes, and you double your performance. Some competitors use n-way caching across four or more nodes, and it is a semi-religious argument on the pros and cons of each approach. Barry feels the 2-way caching implemented by Spectrum Virtualize is the most effective and efficient for performance.
All of the nodes are connected over IP network, but there is one designated as a "config node", and one, often the same, as a "boss node".
A cluster can have up to three physical quorum disks (either drive or mDisk) and optionally up to five IP-based quorums. The IP-based is just a Java program that runs on any server or Cloud, provided it can respond within 80 msec.
Either IP-based or physical quorum can be used for "tie-breaking" a split-brain situations. In the event there is no "active" quorum, the administrator can now serve as the tie-breaker manually. Barry recommends for Storwize clusters, where physical quorum disks are attached to a single node-pair, that you have at least one IP-based quorum for tie-breaking.
However, only physical quorum can be used for T3 Recovery. T3 Recovery happens after power outages. All of the nodes update the quorum disk with critical information of all of the virtual mappings of blocks to volumes, and this is used when bringing up the nodes again.
To protect against one pool consuming all of the cache, Spectrum Virtualize will partition the cache, and prevent any one pool from consuming more than a certain percentage of the total cache. The percentage depends on the number of pools:
Number of Pools
Max percentage of any individual pool
5 or more
Barry explained how failover works in the event of node failure. There is voting involved, and the majority remains in the cluster. In the case of an even split, called a "split brain" situation, the quorum decides. Orphaned nodes in a node-pair go into write-through mode, since the cache is no longer mirrored.
The I/O forwarding layer has been split between upper and lower roles. The upper layer handles access I/O groups. The lower layer handles asymmetric access to drives, mDisks and arrays.
N-port ID Virtualization (NPIV) drastically improves multi-pathing. Perhaps one of the coolest improvements in awhile, NPIV allows us to assign "Virtual" WWPN to other ports. When an I/O sent to a single port fails, it retries one or more times again, then waits 30 seconds, and then invokes multi-pathing to find a completely different path to the data. With NPIV, when a port fails, its WWPN is re-assigned to a different port, so the retries are likely to be successful before having to wait 30 seconds!
Lastly, Barry covered the delicate art of Software upgrades. Software is rolled forward one node at a time, and the "cluster state" is maintained during this time.
Different presentations this week are at different technical levels. My session was meant to be an overview of the concepts of Business Continuity, independent of specific operating system platform, using specific IBM products to help illustrate specific examples. Barry's was a deep dive into a single product family.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here's my recap of the afternoon sessions of Day 1.
Storage Brand Opening Session - Craig Nelson
Craig Nelson, Brocade manager for IBM Field Sales Channel, indicated the network equipment is the bridge that brings servers and storage together.
The squeeze -- faster servers and Flash storage causes storage networking to become the bottleneck. Fibre Channel will remain the protocol of choice for the next decade.
"Speed is the net currency of Business" -- Marc Benioff, Salesforce CEO.
Craig drew an analogy. We have been focused on making hard disk drives faster, and then Flash changed the game. Likewise, car manufacturers have focused on making gas engines better, and then Tesla Motors introduces an electric car with insane performance. The early models actually had an "Insane Mode".
The new Gen6 models of IBM b-type SAN equipment will support 32Gbps and 128Gbps ports. That's Insane!
Later models of Tesla Motors offer a "Ludicrous Mode". For flash storage, it is NVMe. NVMe can get storage down to 20 microsecond latency. That's Ludicrous!
Craig put in a plug for two Brocade sessions: "BEWARE - The four potholes on your road to success when deploying flash storage" and "Tune up your storage network! Is it healthy enough for flash storage and next-gen server platforms?"
Storage Brand Opening Session - Clod Barrera
Clod Barrera, IBM Distinguished Engineer and Chief Technical Strategist, presenting storage industry trends.
IDC predicts data capacity to grow 60-80% CAGR. This would require 44 percent drop in $/GB per year to maintain flat budget. Unfortunately, flash media cost is only dropping 25-30 percent per year, and spinning disk only 19 percent per year.
Since storage media will not offset capacity growth, we need other technologies to compensate, including compression, deduplication, defensible disposal, and "cold" storage to tape or optical media.
The smallest persistent storage that IBM has been able to achieve is 12 atoms. Current disk technology is 1200 atoms. Since 1956, IBM and the rest of the IT industry have improved storage 9 orders of magnitude, and now there are only 2 orders of magnitude left.
Clod poked fun at the "Star Wars: Rogue One" movie, indicating that their idea of the future of storage was a huge tape library. See my December 2016 blog post [Has your data gone rogue?]
What does it take to storage information forever? Tape will certainly be around. IBM Zurich demonstrated a 220TB back in 2015 as proof of technology.
A good example of the need for long-term retention are US films. Of those from the silent era, over 90 percent are lost. Over half of the films prior to 1950 are lost. The silver nitrate film stock that the reels were made of have deteriorated. Now that more movies are made digitally, can we do better?
Clouds will move from 10GbE to 25GbE. No slow down for FC in datacenters. Flash storage and object storage are both growing quickly
Move over Software-Defined Storage, Converged and Hyperconverged systems, the new up-and-coming thing are "Composable Systems deployed in Pods" adjustable hourly by workload requirements.
To protect against Ransomware, use "air gap" protection, not on the same network as production workload.
New storage models are needed for Cognitive workloads. Clod put in a plug for Joe Dain's presentation "Introducing cognitive index and search for IBM Cloud Object Storage leveraging Watson"
Storage Brand Opening Session - Axel Koester
Axel Koester, IBM Storage Chief Technologist, presented more storage industry directions.
What will the world look like in 10 years. Today mostly procedural programming, with some statistical big data, and a bit of machine learning. In 10 years, it will be mostly statistical and machine learning, with very little procedural programming. Why? Because it is faster to train computers with Machine Learning, than to program procedurally.
Examples of machine learning are IBM Watson, Google AlphaGo, drive-AI. Axel would rather be a passenger in a machine-learned self-driving car, than a procedurally-programmed one.
Neural networks to interpret hand-written numbers. Welcome to "Unsupervised learning".
A subset of Machine Learning is Deep Learning, a major breakthrough in 2006. Deep Learning is a subset of Machine Learning that uses three or more layers of neural networks. For example, face recognition "deep learning" algorithms can also be used to detect defects through visual inspection of circuit boards.
How does this impact storage?
Procedural -- archive test cases used
Statistical -- store all data for parallel processing
Machine Learning - train sample data, then archive and re-train yearly. Driving 5 minutes = 4 TB of sensor data used for self-driving cars
For Neural processing, x86 CPU are suitable for prototyping. GPU co-processors better, efficient but uncommon. IBM has developed the "TrueNorth" chip does nothing by Neural - 4096 cores with only 70 mW of energy consumption. No clock, instead dendrites, synapses, axons and neurons.
Instead of "Build or Buy?" the new question is "Train or Buy?" Train with confidential data, or buy ready-to-run 100% pre-trained cognitive systems as a service.
AI Frameworks are available on Docker containers with Kubernetes with Persistent storage (Ubiquity) such as Spectrum Scale. These frameworks include DL4J, Chainer, Caffe, torch, theano, tensorflow.
NVMe -- NVM is local only, how to do HA and DR? There are three options:
DB asynchronous shadowing
DB mirroring over NVMeOF
Cluster file system replication of persistent data, such as IBM Spectrum Scale
Example car manufacturer with 50 SAP HANA in memory instances on 4 Spectrum Scale nodes. IBM achieved 50,000 new files per second. Most NAS systems do much less.
Faster media on smaller electronics Holmium atoms on Magnesium Oxide on silver base, resulting in "single atom storage." ATM needle tip magnetizes, measured with Tunnel Magneto-resistance. Unfortunately, reading the data causes it to lose its value, so it is not as persistent as the 12-atom method described by Clod earlier.
As the title suggests, I explained why there is so much interest in Software-Defined Storage in the IT industry, what software-defined storage is, and how to deploy these solutions in your existing infrastructure without the full rip-and-replace. I covered which IBM products are available as software, pre-built systems and/or Cloud services.
This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Day 1 included keynote sessions. Here is my recap for the morning.
General Session "The Quantum Age"
Amy Hirst, IBM Director of Systems Training, served as emcee for the General Session. The theme this week is "Power of Knowledge, Power of Technology, Power of You. You to the IBM'th power".
Chris Schnabel, IBM Q Offering Manager, explained what "IBM Q" is.
Chris feels "our intuition of what we can compute is wrong". Classical (non-Quantum) computing has evolved over past 100 years.
Consider Molecular geometry. The best supercomputer can only handle the smallest molecules, those with 40 to 50 electrons, and even then are unable to calculate bond lengths within 10 percent accuracy. Quantum computing can.
Another area is what computer scientists call the "Traveling Salesman Problem". If you had a list of 57 cities, what would be the optimal path to minimize the distance traveled to get to all of the cities. Doing an exhaustive search would be 10 to the 76th power. Dynamic Programming techniques provide some shortcuts, reducing this down to 10 to the 20th power, but still, that is impossible on most computers.
Chris mentioned that there are easy problems to solve in polynomial time, and hard problems that are exponential, in that they get worse and worse the bigger the input set. There will always be hard problems.
"Nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical, and by golly it's a wonderful problem, because it doesn't look so easy."
-- Richard Feynman
Nature encodes information, but not in ones and zeros. Quantum computers are measured on the number of Qubits, their error rate, etc. The three factors that IBM focuses on are Coherence, Controllability and Connectivity.
Chris explained how Superposition and Entanglement are used in Quantum Computers. I won't bore you with the details here, but rather save this for a future post.
Today: 5 to 16 Qubits (can be simulated with today's classical computers. 5 Qubits is the power of your typical laptop)
Near future: 50-100 Qubits (too big to simulate on supercomputers), with answers that are approximate or correct only 2/3 of the time.
Future: millions of Qubits, fault-tolerant to provide exact, precise answers consistently.
Quantum Computing opens up a new range of problems, what Chris call "Quantum Easy" problems. Problems that might take years to solve on classical supercomputers could be solved in seconds on a Quantum computer.
Chris showed a picture of [Colossus], the first digital electronic computer used in the 1940s. Quantum computing today is like 1940's of classical computing.
IBM is now working on Hybrid Quantum-Classical algorithms, for example:
Quantum Chemistry - can be used in material design, healthcare pharmaceuticals
Optimization - logistics/shipping, risk analytics
There are different ways to build a quantum computer. IBM chose a single-junction transmon design, using Josephson junctions. While the chips are small, the refrigerators they are contained in are huge, and have to keep the chips at very cold 15 milliKelvin temperature (minus 459 Fahrenheit)!
To get people excited about Quantum computing, IBM created the "IBM Q Experience" [ibm.com/ibmq] that allows the public to run algorithms on a basic 5 Qubit system using a simple drag-and-drop interface to put different transformational gates in sequence.
IBM Research team were shocked to see 17 publications in prestigious journals make practical use of this 5 qubit system! Since then, IBM now offers a Software Developers Kit (SDK) called QISkit (pronounced Cheese-kit) as a text-based alternative to the drag-and-drop interface.
Amy Hirst came back on stage to remind people to use Twitter hashtag #ibmtechu to follow the event. There are two more events like this planned for the end of the year. A Power/Storage conference in New Orleans, October 16-20, and another event focused on z Systems mainframe, November 13-17.
Pendulum Swings Back -- Understanding Converged and Hyperconverged Systems
This presentation has an interesting back-story. At a client briefing, I was asked to explain the difference between "Converged" and "Hyperconverged" Systems, which I did with the analogy of a pendulum. I used the whiteboard, and then later made it into a single chart.
At the far left of the pendulum, I start with mainframe systems of the early 1950s that had internal storage. As the pendulum swings to the middle, I discuss the added benefits of external storage, from RAID protection and Cache memory to centralized management and backup.
To the far right of the pendulum, it swings over to networked storage, from NAS to SAN attached devices for flash, disk and tape. This offers excellent advantages, including greater host connectivity, and greater distances supported to help with things like disaster recovery.
Here is where the pendulum swings back. IBM introduced the AS/400 a long while ago, and more recently IBM PureSystems that combined servers, storage and switches into a single rack configuration. Other vendors had similar offerings, such as VCE Vblock, Flexpod from NetApp and Cisco, and Oracle Exadata.
Lately, the pendulum has swung fully back to internal storage, with storage-rich servers running specialized software on commodity servers. There are two kinds:
Pre-built systems like Nutanix, Simplivity or EVO:Rail which are x86 based server systems, pre-installed with software and internal flash and disk storage.
Software that can be deployed on your own choice of hardware, such as IBM Spectrum Accelerate, IBM Spectrum Scale FPO, or VMware VSAN.
So, over time, my single slide has evolved, and fleshed out into a full blown hour-long presentation!
Cloud storage comes in four flavors: persistent, ephemeral, hosted, and reference. The first two I refer to as "Storage for the Computer Cloud" and the latter two I refer to as "Storage as the Storage Cloud".
I also explained the differences between block, file and object access, and why different Cloud storage types use different access methods.
Finally, I covered some of our new public cloud storage offerings, using OpenStack Swift and Amazon S3 protocols to access objects off premises, including the new Cold Vault and Flex pricing on IBM Cloud Object Storage System in IBM Bluemix Cloud.
(FCC Disclosure: I work for IBM. I have no financial interest in SUSE, Scality, or any other storage vendor mentioned in this post. This blog post can be considered a "paid celebrity endorsement" for IBM Storwize, IBM Cloud Object Storage, and IBM Spectrum Storage software mentioned below.)
The study takes a realistic request for 250 TB of storage, at 25 percent compound annual growth rate (CAGR), to store infrequently accessed data in an online archive, and then looks at the Total Cost of Ownership (TCO) over five year period.
The study compares five different Software-Defined Solutions and three pre-built systems. The Software-defined solutions come as software-only, requiring that you purchase the hardware separately and build it yourself. The three pre-built systems were chosen from the top three storage vendors in the marketplace: Dell EMC, IBM and NetApp.
The cost of support is factored in, as it should be. To keep things equal, no data reduction like data deduplication or compression were used.
In an odd approach, the study mixes block, file and object based approaches all in the same study.
You can read the full 14-page study (linked above). I have organized the results into a single table, ranked from best to worst, color coded for the best deals in green ($100K to $200K), moderate solutions in yellow ($200K to $300K) and most expensive in red (over $300K). I put the software-only options on the left and pre-built systems on the right.
SUSE Enterprise Storage 4
IBM Storwize V5010
DataCore SAN Symphony
Red Hat Ceph Storage
Dell EMC Unity 300
I am often asked, "Isn't the software-only, build-it-yourself approach, always the lowest cost option?" Now, I can answer, "Sometimes yes, sometimes no." Fortunately, IBM offers Software-Defined Storage in a variety of packaging options including software-only, pre-built systems, and in the Cloud as a service.
IBM Storwize V5010 is based on IBM Spectrum Virtualize software, which you can deploy as software-only on your own x86 servers. This was not mentioned in the study, and perhaps it is my job to remind people that this option is also available for those who want to build their own storage.
For that matter, IBM Cloud Object Storage System -- available as software-only, pre-built systems, and in the Cloud -- might also be a cost-effective alternative.
Next week I will be in Orlando, Florida for the IBM Systems Technical University. If you are attending, stop by one of my presentations, or look for me at the Solution Center at one of the IBM peds, or attend the "Meet the Experts for IBM Storage" on Thursday!
I have been blogging for more than 10 years now, so I am no stranger to commenting on competitive comparisons. In some cases, I am setting the record straight, and other times, poking fun at competitor results, claims or conclusions. This comparison from Brian Carmody was too juicy to ignore.
(FCC Disclosure: I work for IBM. I have no financial interest in Infinidat, Dell EMC, nor Pure Storage, mentioned in this post. I do have friends and former co-workers who now work for Infinidat. This blog post can be considered a "paid celebrity endorsement" for IBM FlashSystem products.)
Here is an excerpt, I have added (Infinidat) wherever Brian says "we" just so there is no confusion:
"... So last week we (Infinidat) finally got around to running the same profiles against an INFINIDAT F6230 in our Waltham Solution Center, configured with 1.1TB of DDR-4 DRAM, 200TB TLC NAND, and 480 3TB Nearline HDDs.
In summary, we (Infinidat) wrecked the Pure and EMC systems. Here are the results side by side with EMC's data:
EMC Unity 600F
16K IOPS (80% Read)
9x Pure, 5x Unity
256K BW MBps
10.6x Pure, 3x Unity
4.5x Pure, 1.6x Unity
Steady-state latency (ms)
1/7 Pure, 1/2 Unity
By the way, we (Infinidat) took the liberty of running the test with a 200TB data set instead of Pure and EMC's 50TB because modern workloads require performance at scale, and we ran it with in-line compression enabled because our compression algorithm doesn't hurt performance.
This was an interesting test to run, and we (Infinidat) hope it helps the storage industry move away from media type wars and benchmarks (you will lose every time on performance if INFINIDAT is in the mix) ..."
Notice anything wrong here? anything missing?
The Tortoise beat "Hare 1" and "Hare 2", but did not invite the Cheetah to the race?
Brian was smart enough not to compare their product to anything from IBM. IBM has a wide variety of All-Flash Arrays, including the DS8880F models, the Storwize V7000F and V5030F models, and Elastic Storage Server models. However, for this workload, IBM would probably recommend the FlashSystem V9000, A9000 or A9000R.
Any All-Flash Array with a steady-state latency of 2 milliseconds or greater is embarassing, but then Infinibox is not really an All-Flash Array.
The architecture of their Infinibox appears much like the original XIV. It has a mix of DRAM memory and SSD cache, combined with spinning drives. It offers only compression, not data deduplication. Unlike the IBM XIV powered by six to 15 servers, the Infinibox appears under-powered with just three servers.
The Infinibox uses software-based in-line compression, which must put a huge tax on the few CPUs they have in those three servers. Infinidat chose not to compress the data in their cache, probably to reduce the additional overhead on their over-taxed CPUs.
The IBM FlashSystem V9000 has an innovative design, based on IBM Spectrum Virtualize, the mature software that you also find in the IBM SAN Volume Controller and Storwize family of products.
The FlashSystem V9000 offers hardware-accelerated compression. IBM takes advantage of the integrated Intel QuickAssist co-processor which runs the compression algorithm 20 times faster than standard Intel Broadwell CPU.
IBM compresses its cache, using a two-tier approach. The "upper cache" receives the data uncompressed, so that it can then tell the application to continue, for fastest turn-around time. Then the data is compressed, and stored in the "lower cache", optimizing the value and benefits of DRAM memory. Many databases get up to 80 percent savings, resulting in a 5-to-1 benefit in DRAM cache memory.
The IBM FlashSystem A9000 and A9000R also have an innovative, based on IBM Spectrum Accelerate, the code originally developed for IBM XIV storage system.
(Fun fact: Infinidat's founder, [Moshe Yanai], was formerly the founder and designer of XIV, and it appears that Infinidat is just a re-design of old XIV technology architecture, re-packaged with a few differences. Since Moshe left, IBM has drastically enhanced the IBM XIV.)
Like the IBM Spectrum Virtualize family, the IBM FlashSystem A9000 and A9000R have hardware-accelerated in-line compression, and two-tier approach to cache. The "upper cache" receives the data uncompressed, then the data is compressed and deduplicated, and stored in the "lower cache", optimizing the value and benefits of DRAM memory.
The IBM FlashSystem A9000 and A9000R also offer in-line data deduplication. Modern workloads are virtualized, and Virtual Machine (VM) and Virtual Desktop Infrastructure (VDI) get significant benefits from data deduplication. Infinidat does not play here. For the FlashSystem A9000, most of the metadata related to data deduplication is in cache, minimizing the overhead.
IBM FlashSystem A9000 and A9000R have full performance that blows these published Infinibox results away WITH compression and deduplication turned on.
Brian ran a workload that used the DRAM and SSD cache exclusively, eliminating the reality that any REAL WORLD workload would have to tap into those much slower spinning drives. This is not really a side-to-side benchmark. He is comparing his live run on Infinibox to published numbers from a previous comparison run on a completely different set of data.
This raises the question, why pay for all those spinning drives at all, if you plan to only use the DRAM and Flash storage for your workloads?
A week later, Brian followed up with another post [The INFINIDAT Challenge], acknowledging his comparison was bogus. Here's an excerpt. Again, I have added (Infinidat) wherever Brian is referring to his employer just so there is no confusion:
"... It's not likely that a room full of storage engineers will ever agree on parameters for a synthetic benchmark since storage evaluations are competitive and control of test parameters will invariably predetermine the 'winner'. However, I hope we can all agree that synthetic benchmarks are a waste of time, and that real world performance is what matters in the data center.
So, what can we (Infinidat) do about it?
We (Infinidat) cordially invite every enterprise storage customer who wants lower latency and lower storage cost to visit [FasterThanAllFlash.com] and sign up for The INFINIDAT Challenge.
We (Infinidat) will Give you an Infinibox system to test
We (Infinidat) will Help you clone and test your environment with Infinibox
We (Infinidat) Guarantee your applications will run faster on Infinibox than your All-Flash Array.
If we (Infinidat) fail, we'll take the system back and Donate $10,000 to the charity of your choice.
If our technology delivers, you can keep the system, and we'll (Infinidat) Donate $10,000 in your name to the charity of our choice (The American Cancer Society).
Thanks again to all who participated in the dialog over the past week. I know the post generated some controversy. Traditional storage companies are fighting for their lives trying to keep enterprise storage expensive; indeed their business models are predicated upon maintaining price levels from a bygone era...."
As consolidation play doing full range of data services, I do not see this Infinibox working out. Talking to clients who have the Infinibox, the performance deteriorates in REAL WORLD workloads as you add more data to the unit.
The Infinibox seems fine for workloads that do not demand high performance, so I was surprised Brian compared it to All-Flash arrays. The Infinibox is out of its league!
(To be fair, Pure Storage and EMC XtremeIO aren't really in the same league as IBM FlashSystem, either, given that both of those products are based on commodity SSD. IBM FlashSystem models are consistently 4 to 10 times lower latency than these Commodity-SSD based competitors.)
The Infinibox also lacks features many people expect in an Enterprise-class storage array, like Call-Home capability to identify problems quickly, and Synchronous remote mirroring for disaster recovery. It is often common for startups like Infinidat to deliver a [Minimum Viable Product] as their first offering.
To paraphrase Brian himself, your applications will lose every time on performance if INFINIDAT is in your datacenter.
The new TS1155 enterprise tape drive can write up to 15 TB uncompressed data to existing JD/JZ/JL media.
It can read/write existing 10TB-formatted JD media, and 7TB-formatted JC media, written by former TS1150 drives. It also can offer read-only support for older 4TB-formatted JC media from TS1140 drives.
These are uncompressed capacities, and some clients achieve 2x or 3x compression on top of these capacities. This depends heavily on the type of data. Your mileage may vary, as they say.
Most of the rest of the features of the TS1150 drives carry forward., The performance 360 MB/sec is similar, encryption via IBM Security Key Lifecycle Manager (SKLM) is similar, and support for IBM Spectrum Archive via Linear Tape File System (LTFS) format is similar.
An interesting development is that the TS1155, in addition to standard 8Gb Fibre Channel attach, is the first IBM enterprise drive to also offer 10Gb Ethernet support. IBM will offer both RDMA over Converged Ethernet (RoCE) as well as iSCSI support.
The newest member of the IBM Spectrum Storage software family, IBM Spectrum Copy Data Management automates the creation of snapshot images (FlashCopy for those familiar with IBM terminology) on IBM, NetApp and EMC storage arrays. These copies can be made for various uses, such as DevOps, Dev/Test, Backup/Restore, and Disaster Recovery.
At some data centers, these copies can consume as much as 60 percent of your total storage space, because often each developer and tester are generating their own copies. Instead, having copies automated, registered, cataloged, and made available to developers and testers eliminates rogue copies.
This release adds support for additional databases, including Microsoft SQL Server on physical machines, SAP HANA in-memory databases, and Epic/Caché from InterSystems used in Electronic Health Records (EHR) management systems.
IBM also adds support for long-distance Vmotion for VMware virtual machine images. The target for this movement is IBM Spectrum Accelerate running on IBM Bluemix Cloud, supporting Hybrid Cloud configurations.
Over the past ten years, my co-workers have asked to write a "guest post" on this blog. This time, Moshe Weiss, IBM Senior Manager, Development and Design, has offered the following post, not in his own voice, but in the voice of his "baby", the Hyper-Scale Manager software.
You might think this is a strange approach, but today we have robots that can dance, and cars that can drive themselves! If software could talk, this is what IBM Hyper-Scale Manager would say:
"I was born a year ago.
It wasn't an easy birth… there were many complications. In fact, so many, that I was almost prematurely born!
Most of my development, in preparation for labor and delivery, was done within the last 6 months of the overall 18 months. I was shaped and designed, and sometimes re-shaped, three times. Lots of assumptions had to be made in hopes to ease a successful delivery and help bring me to full term of the birthing process.
During my first year of maturity, I focused on learning how customers used me; what frustrated them the most, and what they loved or 'almost' loved, while still needing refinement and redesign.
The number of customers adopting me grew higher and higher, as did the number of complaints and bugs that I had to deal with, and my users’ frustrations and dislikes because I wasn't yet a complete solution and still had some missing features.
I was renewed four times! Each time of which improved me and made my senses better, faster, adding new capabilities that helped make me more approachable, intuitive and delightful.
Choosing how to renew, and what to add to each renewal, is not an easy task. Basically, it was about prioritizing user experience versus gaps that were deferred from my birth, versus differentiators to make me unique and sell more, versus features in my roadmap, versus investing huge efforts in my quality.
Each renewal was a complex process with lots of features and behaviors to add, while trying to make my customers’ life a bit easier, since features that were important to them were sometimes considered low priority.
But, there were also good times during my first year:
Huge customer adoption rate
100 new customers in two months!
Growing was a great thing and my parents were and are still so proud! But, like with most things, it came with a price - a lot of sustain issues from the field, requests for changes and bad feedback that I am hard to use and missing core elements.
Being a new baby in the Storage world is not a simple thing, as expectations are huge (mainly because of my successful elder brother, the XIV GUI) and I must quickly keep up with all of them.
Although, I am getting tons of good feedback for being revolutionary and unique. People are emotionally engaged with me, and being that I’m a baby, I love to see emotions!
Huge marketing efforts to put me center stage
However, because of some initial problems at the start -- I am a new product, remember? -- I was thrown out of multiple customer sites, and some sales/marketing guys just stopped believing in me. That made me sad.
My parents did a great job, though, in talking, explaining and demonstrating what I can do, together with what I can’t do now, but will do soon. This really helped in some areas, and customers began to see what my parents saw in me for so many years.
I’m really enthusiastic to hear what people will think of me when I’m two years old!
As part of the renewal I had four times during my first year, design elements were reconsidered, redesigned and rewritten to find the best solutions ever. No product has come even close to what I suggest to the world… I am so proud of myself!
Additionally, my parents wrote approximately 20 patents on my User Interface (UI) elements and User Experience (UX) concepts, which makes me extremely unique.
Prioritization of what goes in and what doesn't, especially during a time when fewer and fewer babysitters handled me during that year. It was a real challenge. Read my parent's post [How to drive forward an exhausted team?] for more details.
But my parents did it! They succeeded to add cool features like:
Filter analytics and free text, making the filter a great experience that everyone is using.
Great UX improvements like redesigning the tabs, adding right click menus, and adding more on-boarding enablers
Improving the dashboard.
Improving my core business, capacity management (four different times!), and still working on it.
Adding features that were initially deferred in my birth. Deferring features back then was the way to make my birth go smoother. Now, these missing features annoy people.
Improving quality dramatically, adding automation to the way people test me.
Adding differentiators, like the health widget, with more than 20 best practices that provide helpful tips to the customer when there’s a need to change something in their environment, to avoid future issues.
Continue to bring added values for the 'A-family'. I am monitoring: FlashSystem A9000/R, XIV and Spectrum Accelerate, both on and off premises. This added value makes for a family with the most powerful management solutions and experience."
If you are planning to attend the upcoming IBM Systems Technical University, Orlando Florida, May 22-26, There will also be a variety of hands-on labs. I recommend participating in the hands-on session to feel and witness the next release of IBM Hyper-Scale Manager.
This week, I was part of an all-day event called "Healthcare and Research Trends & Directions in a Cognitive World" at the IBM Executive Briefing Center (EBC) in Rochester, MN. I was one of many presenters covering Information Technology to improve healthcare outcomes. Todd Stacy, IBM Director Server Sales for US Public Market, served as our emcee.
This was a great day. Special thanks to Kathy Lehr, Trish Froeschle, and Scott Gass for organizing this event! We had clients from a variety of Health Care and Life Science industry backgrounds. I certainly learned a few things myself.
Dr. Michael Weiner, IBM Chief Medical Information Officer, Watson Health, covered some of the real challenges not just facing the United States, but also other countries. On average, healthcare in USA [costs over $10,000 USD per American citizen]! Compare that to only $3,700 USD for the folks in the United Kingdom! In fact, nearly all industrial nations spend between $2,000 and $5,000 per person. Where does all the U.S. money go?
A big challenge is our ever-aging population. Every day, there are 10,000 [Baby Boomers] reaching their 65th birthday, with fewer people in the 25-44 age group to work as nurses to take care of them. About 15 percent of the US population are elderly (over age 65) and this is expected to grow to 20 percent in year 2040. The situation is even worse in Japan, where 25 percent of the population today is elderly, and this is expected to be 40 percent by year 2060.
New Care Models
In some countries, like Australia and Japan, post office workers who spent their time delivering mail, now can stop in to check in on elderly people. As people ship less mail, using social media or email instead, this keeps the postal workers employed, in a manner that provides society value.
The USA enjoys one of the lowest costs for food, but then suffers from an epidemic of obesity, with over 34 percent of Americans are obese. When New York City eliminated Trans Fats, heart attacks dropped considerably.
In 2009, the Health Information Technology for Economic and Clinical Health [HITECH] Act required the digitization of medical information, known as "Meaningful Use", which has greatly influenced healthcare facilities. This was implemented by a combination of incentives and penalties. Now, more than than 92 percent of hospitals in the USA have digitized medical information! The rest are still using paper and Xray film images. Some places were initially exempted, such as Assisted Living Homes for example, so there is still more work to be done.
An advantage of using computer-based solutions like Artificial Intelligence is that it eliminates bias. When a woman walks into an Emergency Room complaining about chest pains, few health staff would consider this a sign of heart attack. When a man does same, health staff considers heart attack as the first diagnosis, at the risk of missing out on other possibilities.
Every year, over a million articles related to healthcare research are published. Who can read all this in a timely manner? IBM Watson! After [winning in Jeopardy], IBM Watson was "sent to medical school" to learn how to assist doctors in diagnosing patients.
Transforming Health Care Data Management with IBM Spectrum Storage
Greg Tevis, IBM Software Defined Storage Architect, and Raj Tandon, IBM Senior Strategist, co-presented this introduction to IBM Spectrum Storage family of products. They covered examples with IBM Spectrum Virtualize, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Scale, IBM Cloud Object Storage, and IBM Copy Data Management. The latter having support directly for EPIC and Cache databases.
Cognitive Imaging Solutions for Healthcare Providers
Jason Crites, IBM Healthcare and Life Sciences Data Solutions Leader, and Wayland Vacek, Enterprise Sales Manager for Merge, presented IBM Watson Imaging Clinical Review, from IBM's acquisition of the Merge company. The solution is based on IBM Spectrum Scale as the back-end storage repository.
Merge has been around for more than 20 years, with clinical workflow offerings in Cardiology, Radiology, Orthopedics and Eye care. Often, IBM Watson is able to identify things in medical images that escape the review or radiologists or other medical specialists.
At HIMSS conference earlier this year, The human radiologists were shown a collection of images used to train IBM Watson. The human radiologists only identified 20 percent of the images correctly, while IBM Watson got all of them, every time. In many cases, human radiologists have only a few seconds to look at an Xray image. Computers like IBM Watson are now fast enough to compete directly with human radiologists in the same number of seconds.
Building a Foundation for the Cognitive Era in Healthcare and Life Sciences
Dr. Jane Yu, IBM Systems Architect, Healthcare & Life Sciences, and Dr. Frank Lee, IBM Global Sales Leader, IBM Software Defined Infrastructure & Life Sciences, co-presented this topic. They present five challenges:
Growing data volumes are making it more difficult to manage, process and store this data.
Scientists find themselves spending more than 80 percent of their time manually integrating data from silos, and less than 20 percent of their time doing actual research and deriving insights from their analyses.
Compute- and data-intensive workflows may take days to complete on existing server and storage systems.
IT organizations must keep up with rapidly evolving applications, development frameworks, and databases for preferred. Health care Life Science (HCLS) applications. This includes SAS, Matlab, Hadoop, Spark, NoSQL databases, as well as Deep Learning and Machine Learning workloads.
Scientific integrity and government mandates increasingly require collaboration across organizational boundaries.
In one example, Sidra Medical and Research Center plans to map the genomes of all 250,000 citizens in the Middle Eastern country of Qatar. Imagine that processing each Qatari citizen will generate 200 GB of data for this project, resulting in 50 Petabytes (PB) of data!
Combining IBM Spectrum Compute products with IBM Spectrum Scale storage, can help address these challenges.
Modernize & Transform Helathcare with IBM Storage Solutions
Finally, I presented a 90-minute breakout session that covered three solution areas:
Flash storage to speed up medical records and research. Those who have already implemented Electronic Health Records (EHR) for "Meaningful Use" compliance recognize the value this provides to improving healthcare. Adding All-Flash Arrays such as IBM FlashSystem, Storwize V7000F or DS8000F can drastically improve application performance.
Spectrum Scale and IBM Cloud Object Storage for Vendor Neutral Archive. It seems silly that each PACS vendor has its own little island of storage. A better approach is to send all PACS data from various vendors into a "Vendor-Neutral" storage repository. Both IBM Spectrum Scale and IBM Cloud Object Storage System, either linked together or used separately, can be part of a VNA solution.
VersaStack to simplify deployments. VersaStack is a Converged System that combines best-of-breed Cisco servers and switches with best-of-breed IBM storage, pre-cabled, pre-configured, and pre-loaded with all the necessary software to manage the environment as a single entity. This can reduce the time it takes to deploy new medical applications from weeks to just hours.
Next month, I will be presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. There will not be an "IBM Edge" conference this year, so this is your best opportunity to hear the latest information on all of the IBM server and storage products at one conference.
I will be there! Here are the topics I will be presenting:
The pendulum swings back -- Understanding Converged and Hyperconverged environments
IBM cloud storage options
Software Defined Storage -- Why? What? How?
Business continuity -- The seven tiers of business continuity and disaster recovery
Introduction to object storage and its applications - Cleversafe
New generation of storage tiering: Less management, lower investment and increased performance
IBM Spectrum Scale for file and object storage
This conference is not all lectures, which some refer to as "Death by Powerpoint".
There will also be a variety of hands-on labs. I recommend participating in the hands-on session to feel and witness the next release of IBM Hyper-Scale Manager, which is the management application for what IBM calls its A-line storage family -- FlashSystem A9000/R, XIV Storage System, and Spectrum Accelerate software.
Hyper-Scale Manager is the most advanced GUI in the market today, may help reduce your management total cost of ownership (TCO) in half!
You can [Enroll Today!] There is an "early-bird" special to save hundreds of dollars if you enroll by April 16!