This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
This week, I am in Las Vegas for [Edge 2016], IBM's Premiere IT Infrastructure conference of the year. In previous years, this conference was held in May, June or July, but this year, it was moved back to September, to coincide with the 60th Anniversary of IBM Disk Systems.
I have arrived safely to Las Vegas, and checked in at Edge 2016 Conferenece Registration.
This year, the Solutions EXPO opens early, on Sunday with a reception. This gives people a chance to go to booth #330 to make appointments for one-on-one with various IBM Executives!
I was able to catch up with co-workers I have not seen in a while! There is a whole section on IBM storage products such as the IBM DS8888 All-Flash Array, as well as software products like IBM Spectrum Protect and IBM Spectrum Control.
On Monday, my session "All Flash is Not Created Equal: Tony Pearson Contrasts IBM FlashSystem and SSD" has moved from the tiny room to a much larger room "Studio A". There was a lot of demand for this session, so I have agreed to present this again, as a repeat session, on Wednesday.
Edge will be different in many ways this year. The past few years we had separate "Executive Edge" for C-level executives, "Winning Edge" for IBM Business Partners, and "Technical Edge" for server, network and storage administrators.
This year, all 1,000 sessions are combined back into one, but with clever hints in the titles. The words "General Session", "Outthink" or "Cognitive" are used to indicate C-level executive talks. Those that use the terms "Winning" or "Community" target IBM Business Partners, Managed Service Providers and Cloud Service Providers. Those that mention z Systems, POWER servers, or Storage solutions, often adding the term "Deep-Dive", are technical.
(Unlike other sessions that might appeal to one portion of the audience or another, mine are suitable for everyone, from C-level executives and IBM Business Partners to storage administrators. To help people find them under the new naming scheme, I have added "Tony Pearson Presents", or words to that effect.)
About 260 breakout sessions relate to IBM Storage, but there are only 20 or so time slots, so obviously you can't see them all in person.
I strongly suggest you pick about three to five topics per time slot, so that you are not overwhelmed by the dozens of choices during the event. This allows you to make a quick decision on which one you finally decide on during each time slot.
Occasionally, a session might get canceled, postponed, or be so full of attendees that nobody else is allowed in, so having three to five topics selected allows you to chose an alternate.
Here is my schedule for next week at Edge 2016.
Trends & Directions: The Future of Storage in the Cloud and Cognitive Era
All Flash is Not Created Equal: Tony Pearson Contrasts IBM FlashSystem and SSD
MGM Grand - Studio 9
Solution EXPO: Reception
Edge at Night: Poolside Reception and Concert "Train"
Tony Pearson Presents IBM Cloud Object Storage System and Its Applications
MGM Grand - Room 114
The Pendulum Swings Back: Tony Pearson Explains Converged and Hyperconverged Environments
MGM Grand - Room 113
Solution EXPO: Reception
Tony Pearson Presents IBM's Cloud Storage Options
MGM Grand - Room 116
My colleagues Dave Dabney or Adam Bergren will be located at the WW Systems Client Centers Booth 125 of the Solution EXPO.
If you are active in Social Media, consider using the hashtags #IBMedge, #IBMstorage, and #IBMcloud. You can follow me on Twitter, my handle is @az990tony
For those interested in a one-on-one meeting with me, over breakfast, lunch or dinner, or some other time, I have several slots still available. Fill out a request form on BriefingSource at: [https://briefingsource.dst.ibm.com/]
SAP HANA is an in-memory, relational database management system supported on Linux for x86 and POWER servers. The "HANA" acronym is short for "High-Performance Analytic Appliance" software. By keeping the data in memory, analytics and queries can be performed much faster than from traditional disk repositories.
Server memory, however, is volatile storage, so the data needs to be stored on persistent storage such as flash or disk drives. SAP has certified several configurations, some involve IBM Spectrum Scale solutions. I will use the following graphic to explain the three configurations.
Linux on x86-64 with Spectrum Scale FPO
With SAP HANA on Lenovo x86-64 servers, SAP has certified internal flash or disk drives running IBM Spectrum Scale in "File Placement Optimization" (FPO) mode. FPO provides a shared-nothing architecture that matches the SAP HANA architecture. IBM Spectrum Protect can backup this configuration, providing data protection and disaster recovery support.
Linux on POWER with Elastic Storage Server
With SAP HANA on POWER servers, SAP has certified external Elastic Storage Server (ESS). Not only is POWER the better platform to run SAP HANA than x86-64, but Elastic Storage Server offers excellent erasure coding to provide excellent rebuild times and storage efficiency.
The ESS is a pre-built system that combines IBM Spectrum Scale software with server and storage hardware. IBM Spectrum Protect can also backup this configuration, providing data protection and disaster recovery support.
Block-level Storage over Storage Area Network (SAN)
Various IBM block-level devices are support for SAP HANA on both Linux on x86-64 and Linux on POWER. Unfortunately, SAP only has certified (to date) the use of the XFS file system. The problem many clients mention about this configuration is the lack of end-to-end backup and disaster recovery. This is solved by the Spectrum Scale configurations in the previous two examples.
Other combinations, such as SAP HANA on POWER with Spectrum Scale FPO, or on x86-64 servers with Elastic Storage Serer, are either not SAP-certified, or not directly supported by SAP without their approval.
IBM and SAP have worked closely together for many years, and I am glad to see SAP HANA and IBM Spectrum Scale based solutions continue this tradition.
As we get to larger and larger flash and spinning disk drives, a common question I get is whether to use RAID-5 versus RAID-6. Here is my take on the matter.
A quick review of basic probability statistics
Failure rates are based on probabilities. Take for example a traditional six-sided die, with numbers one through six represented as dots on each face. What are the chances that we can roll the die several times in a row, that we will have no sixes ever rolled? You might think that if there is a 1/6 (16.6 percent) chance to roll a six, then you would guarantee hit a six after six rolls. That is not the case.
# of Rolls
Probability of no sixes (percent)
So, even after 24 rolls, there is more than 1 percent chance of not rolling a six at all. The formula is (1-1/6) to the 24th power.
Let's say that rolling one to five is success, and rolling a six is a failure. Being successful requires that no sixes appear in a sequence of events. This is the concept I will use for the rest of this post. If you don't care for the math, jump down to the "Summary of Results" section below.
Error Correcting Codes (ECC) and Unreadable Read Errors (URE)
When I speak to my travel agent, I have to provide my six-character [Record Locator] code. Pronouncing individual letters can be error prone, so we use a "spelling alphabet".
The International Radiotelephony Spelling Alphabet, sometimes known as the [NATO phonetic alphabet], has 26 code words assigned to the 26 letters of the English alphabet in alphabetical order as follows: Alfa, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliett, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu.
Foxtrot Golf Mike Oscar Victor Whiskey
Foxtrot Gold Mine Oscar Vector Whisker
Boxcart Golf Miko Boxcart Victor Whiskey
Having five or so characters to represent a single character may seem excessive, but you can see that this can be helpful when communications link has static, or background noise is loud, as is often the case at the airport!
If spelling words are misheard, either (a) they are close enough like "Gold" for "Golf", or "Whisker" for "Whiskey", that the correct word is known, or (b) not close enough, such that "Boxcart" could refer to either "Foxtrot" or "Oscar" that we can at least detect that the failure occurred.
For data transfers, or data that is written, and later read back, the functional equivalent is an Error Correcting Code [ECC], used in transmission and storage of data. Some basic ECC can correct a single bit error, and detect double bit errors as failures. More sophisticated ECC can correct multiple bit errors up to a certain number of bits, and detect most anything worse.
When reading a block, sector or page of data from a storage device, if the ECC detects an error, but is unable to correct the bits involved, we call this an "Unrecoverable Read Error", or URE for short.
Bit Error Rate (BER)
Different storage devices have different block, sector or page sizes. Some use 512 bytes, 4096 bytes or 8192 bytes, for example. To normalize likelihood of errors, the industry has simplified this to a single bit error rate or BER, represented often as a power of 10.
Bit Error Rate per read (BER)
Consumer HDD (PC/Laptops)
Enterprise 15k/10k/7200 rpm
Solid-State and Flash
IBM TS1150 tape
In other words, the chance that a bit is unreadable on optical media is 1 in 10 trillion (1E13), on enterprise 15k drives is 1 in 10 quadrillion, and on LTO-7 tape is 1 in 10 quintillion.
There are eight bits per byte, so reading 1 GB of data is like rolling the die eight billion times. The chance of successfully reading 1GB on DVD, then would be (1 - 1/1E13) to the 8 billionth power, or 99.92 percent, or conversely a 0.08 percent chance of failure.
In this paper, Google had studied drive failure using an "Annual Failure Rate" or AFR. Here are two graphs from this paper:
This first graph shows AFR by age. Some drives fail in their first 3-6 months, often called "infant mortality". Then they are fairly reliable for a few years, down to 1.7 percent, then as they get older, they start to fail more often, up to 8.3 percent.
This second graph factors in how busy the drives are. Dividing the drive set into quartiles, "Low" represents the least busy drives (the bottom quartile), "Medium" represents the median two quartiles, and "High" represents the busiest drives, the top quartile. Not surprisingly, the busiest drives tend to fail more often than medium-busy drives.
Given an AFR, what are the chances a drive will fail in the next hour? There are 8,766 hours per year, so the success of a drive over the course of a year is like rolling the die 8,766 times. This allows us to calculate a "Drive Error Rate" or DER:
Drive Error Rate per hour (DER)
For example, an AFR=3 drive has a 1 in 287,800 chance of failing in a particular hour. The probability this drive will fail in the next 24 hours would be like rolling the die 24 times. The formula is (1-1/287,800) to the 24th power, resulting in a failure rate of roughly 0.008 percent.
Let's take a typical RAID-5 rank with 600GB drives at 15K rpm, in a 7+P RAID-5 configuration.
During normal processing, if a URE occurs on a individual drive, RAID comes to the rescue. The system can rebuild the data from parity, and correct the broken block of data.
When a drive fails, however, we don't have this rescue, so a URE that occurs during the rebuild process is catastrophic. How likely is this? Data is read from the other seven drives, and written to a spare empty drive. At 8 bits per byte, reading 4200 GB of data is rolling the die 33.6 trillion times. The formula is then (1-1/E16) to the 33.6 trillionth power, or approximately 0.372 percent chance of URE during the rebuild process.
The time to perform the rebuild depends heavily on the speed of the drive, and how busy the RAID rank is doing other work. Under heavy load, the rebuild might only run at 25 MB/sec, and under no workload perhaps 90 MB/sec. If we take a 60 MB/sec moderate rebuild rate, then it would take 10,000 seconds or nearly 3 hours. The chance that any of the seven drives fail during these three hours, at AFR=10 rolling the DER die (7 x 3) 21 times, results in a 0.025 percent chance of failure.
It is nearly 15 times more likely to get a URE failure than a second drive failure. A rebuild failure would happen with either of these, with a probability of 0.397 percent.
The situation gets worse with higher capacity Nearline drives. Let's do a RAID-5 rank with 6TB Nearline drives at 7200 rpm, in a 7+P configuration. The likelihood of URE reading 42 TB of data, is rolling the die 336 trillion times, or approximately 3.66 percent chance of URE failure. Yikes!
The time to rebuild is also going to take longer. A moderate rebuild rate might only be 30 MB/sec, so that rebuilding a 6TB drive would take 55 hours. The chance that one of the other seven drives fail, assuming again AFR=10, during these 55 hours results in a 0.462 percent.
This time, a URE failure is nearly eight times more likely than a double drive failure. The chance of a rebuild failure is 4.12 percent. Good thing you backed up to tape or object storage!
The math can be done easily using modern spreadsheet software. The URE failure rate is based on the quantity of data read from the remaining drives, so a 4+P with 600GB drives is the same as 8+P with 300GB drives. Both read 2.4 TB of data to recalculate from parity. The Double Drive failure rate is based on the number of drives being read times the number of hours during the rebuild. Slower, higher capacity drives take longer to rebuild. However, in both the 15K and 7200rpm examples, the chance of a URE failure was 8 to 15 times more likely than double drive failure.
Many of the problems associated with RAID-5 above can be mitigated with RAID-6.
After a single drive fails, any URE during rebuild can be corrected from parity. However, if a second drive fails during the rebuild process, then a URE on the remaining drives would be a problem.
Let's start with the 600GB 15k drives in a 6+P+Q RAID-6 configuration. The chance of a second drive failing is 0.0252 percent, as we calculated above. The likelihood of a URE is then based on the remaining six drives, 3600 GB of data. Doing the math, that is 0.0319 percent chance. So, the change of a URE during RAID-6 failure is the probability of both occurring, roughly 0.0000806 percent. Far more reliable than RAID-5!
Likewise, we can calculate the probability of a triple drive failure. After the second drive fails, the likelihood of a third drive at AFR=10, results in 0.00000546 percent.
Combining these, the chance of failure of rebuild is 0.000861 percent.
Switching to 6 TB Nearline drives, in a 6+P+Q RAID-6 configuration, we can do the math in the same manner. The likelihood of URE and two drives failing is 0.0145 percent, and for triple drive failure is 0.00183 percent. Chance of rebuild failure is 0.0163 percent.
Summary of Results
Putting all the results in a table, we have the following:
RAID-5 rebuild failure (percent)
RAID-6 rebuild failure (percent)
600GB 15K rpm
6 TB 7200rpm
Hopefully, I have shown you how to calculate these yourself, so that you can plug in your own drive sizes, rebuild rates, and other parameters to convince yourself of this.
In all cases, RAID-6 drastically reduced the probability of rebuild failure. With modern cache-based systems, the write-penalty associated with additional parity generally does not impact application performance. As clients transition from faster 15K drives to slower, higher capacity 10K and 7200 rpm drives, I highly recommend using RAID-6 instead of RAID-5 in all cases.
As I have mentioned before, I started this blog on September 1, 2006 as part of IBM's big ["50 Years of Disk Systems Innovation"] campaign. IBM introduced the first commercial disk system on September 13, 1956 and so the 50th anniversary was in 2006. That means this month, IBM celebrates the "Diamond" anniversary, 60 years of Disk Systems!
For those who missed it, IBM announced last Tuesday encryption capability for the TS1120 drive, our enterprise tape drive that read and write 3592 cartridges. Do you need special cartridges for this? No! Use the sames ones you have already been using!
You can read more about it www.ibm.com/storage/tape."
Short and sweet, but it got me started, and I ended up writing 21 blog posts that first month. You can read blog posts from all 10 years by looking at the left panel of my blog under "Archive".
While traditional disk and tape storage are still very important and relevant in today's environment, IBM has also expanded into other technologies:
In 2012, IBM [acquired Texas Memory Systems]. In 2014, IBM shipped 62PB, more Flash capacity than any other vendor. In 2015, continued its #1 status, shipping 170PB of Flash, again, more than any other vendor.
IBM has flash everywhere, from the advanced FlashSystem 900, V9000, A9000 and A9000R models, to other all-flash array and hybrid flash-and-disk systems a with various sets of features and functions to meet a variety of workload requirements.
The DS8888 all-flash array, and the DS8886 and DS8884 hybrid flash-and-disk systems round out the latest in the DS8000 storage systems family. SAN Volume Controller and Storwize family of products, based on IBM Spectrum Virtualize software, also have all-flash array and hybrid configurations. The most recent being the Gen2+ models of Storwize V7000F and V5030F. The latest solution is the DeepFlash 150 models, designed for analytics and unstructured data.
Between internally-developed IBM Spectrum Scale and IBM Spectrum Archive, and IBM's [acquisition of Cleversafe], IBM is ranked #1 in Object Storage. IBM Cloud Object Storage System, IBM's new name for Cleversafe's flagship product, is available as software-only, pre-built systems, or in the IBM SoftLayer cloud.
Software-Defined Storage (SDS) with IBM Spectrum Storage
Last year, IBM re-branded its various storage software products under the "IBM Spectrum Storage" family. Earlier this year, IBM announced the new [IBM Spectrum Storage Suite license] which makes it even easier to procure, either with a perpetual software license, elastic monthly licensing, or utility license that combines some of each.
IBM is ranked #1 in Software-Defined Storage, with over 40 percent marketshare, offering solutions as Software-only, pre-built systems, and in IBM SoftLayer cloud.