Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Well, it's Tuesday, and that means IBM announcements!
IBM kicks EMC in the teeth with the announcement of System Storage Easy Tier, a new feature available at no additional charge on the DS8700 with the R5.1 level microcode. Barry Whyte introduces the concept in his [post this morning]. I will use SLAM (sub-LUN automatic movement) to refer generically to IBM Easy Tier and EMC FAST v2. EMC has yet to deliver FAST v2, and given that they just recently got full-LUN FAST v1 working a few months ago, it might be next year before you see EMC sub-LUN FAST v2.
Here are the key features of Easy Tier on the DS8700:
Sub-LUN Automatic Movement
IBM made it really easy to implement this on the DS8700. Today, you have "extent pools" that can be either SSD-only or HDD-only. With this new announcement, we introduce "mixed" SSD+HDD extent pools. The hottest extents are moved to SSD, and cooler extents are moved down to HDD. The support applies to both Fixed block architecture (FBA) LUNs as well as Count-Key-Data (CKD) volumes. In other words, an individual LUN or CKD volume can have some of its 1GB extents on SSD and other extents on FC or SATA disk.
Entire-LUN Manual Relocation
Entire-LUN Manual Relocation (ELMR, pronounced "Elmer"?) is similar to what EMC offers now with FAST v1. With this feature, you can now relocate an entire LUN non-disruptively from any extent pool to any other extent pool. You can relocate LUNs from an SSD-only or HDD-only pool over to a new Easy Tier-managed "mixed" pool, or take a LUN out of Easy Tier management by moving it to an SSD-only or HDD-only pool. Of course, this support also applies to both Fixed block architecture (FBA) LUNs as well as Count-Key-Data (CKD) volumes.
This feature also can be used to relocate LUNs and CKD volumes from FC to SATA pools, from RAID-10 to RAID-5 pools, and so on.
What if you already have SSD-only and HDD-only pools and want to use Easy Tier? You can now merge pools to create a "mixed" pool.
Before this announcement, you had to buy 16 solid-state drives at a time, called Mega-packs. Now, you can choose to buy just 8 SSD at a time, called Mini-packs. It turns out that just moving as little as 10 percent of your data from Fibre Channel disk over to Solid-State with Easy Tier can result in up to 300 to 400 percent performance improvement. IBM plans to publish formal SPC-1 benchmark results using Easy Tier-managed mixed extent pool in a few weeks.
Storage Tier Advisor Tool (STAT)
Don't have SSD yet, or not sure how awesome Easy Tier will be for your data center? The IBM Storage Tier Advisor Tool will analyze your extents and estimate how much benefit you will derive if you implement Easy Tier with various amounts of SSD. Those clients with R5.1 microcode on their DS8700 can download from the [DS8700 FTP site].
Well, it's Tuesday again, and you know what that means? IBM Announcements!
This week, IBM announces the second generation of Storwize V5000 flash and disk storage systems. There are the V5000F All-flash configurations, as well as the V5000 that can support a variety of flash and spinning disk drives.
There are three models:
The V5010 has dual 2-core/2-thread processors and 16GB of cache. It supports thin provisioning, FlashCopy, Easy Tier, and remote mirroring. The base unit includes 1 GbE Ethernet ports for iSCSI host connectivity, with options to add 16GB Fibre Channel, 12Gb SAS, and 10GbE iSCSI/FCoE as well.
The 2U controllers and expansion enclosures can hold either 24 small 2.5-inch drives, or 12 larger 3.5-inch drives. A single control enclosure has two active/active IBM Spectrum Virtualize nodes, and can attach up to 10 expansion enclosures for a maximum of 264 drives.
The V5020 unit has dual 2-core/4-thread processors and up to 32GB of cache. It supports everything the V5010 does, plus encryption. The encryption is done via the Intel AES-NI instruction set to eliminate the need for special "self-encrypting drives" (SED) that other storage devices may require.
The V5030 has dual 6-core/4-thread processors and up to 64GB of cache. It supports everything the V5010 and V5020 do, plus Real-time Compression and external virtualization. The Real-time Compression can achieve up to 80 percent space savings, representing a 5:1 compression ratio.
Each control enclosure can attach to 20 expansion enclosures, which can support 504 internal drives per controller, and up to 1,008 with two controllers (four Spectrum Virtualize nodes) clustered together. This is in addition to the drives in external storage systems virtualized.
Yesterday, I promised I would cover other products from the Feb 12 announcement. Today I will focus on the IBM SAN768B director. Some people are confused on the differences between switchesand directors. I find there are three key differences:
Directors are designed to be 24x7 operation, highly available with no single points of failure or repair. Generally, all components in directors are redundant and hot-swappable, including Control Processors. In switches, some components are redundant and hot-swappable, such as fans and power supplies), but not the “motherboard” or controller. Often you have to take down a switch to make firmware or major hardware changes or upgrades.
Directors are designed to take in "blades" with different features, port counts, or protocol capabilities. You can add or remove blades while the system is up and running. Switches have a fixed number of ports. (A Small Form-factor Pluggable optical transceiver [SFP] is the component that turns electric pulses into light pulses (and visa versa). You plug the SFP into the switch, and then the fiber optic cable is plugged into the SFP).
With switches, you often start with a base number of active ports, and then can enable the rest of the ports as you need them.
Directors have hundreds of ports. Switches tend to have 64 ports or less.
Last year, Brocade acquired McDATA. Both were OEMs for IBM, and IBM distinguished that in the naming convention. The IBM SAN***B name was used to denote products manufactured for IBM by Brocade, and a SAN***M name was used to denote products manufactured by McDATA.
At that time, Brocade and McDATA equipment did not mix very well on the same fabric, so IBM retained the naming convention so that you as a customer knew what it worked with.
Brocade now has released with new levels of both operating systems--Brocade's FOS and McDATA's EOS--and their respective fabric managers--Brocade Fabric Manager (FM) and McDATA's Enterprise Fabric Connectivity Manager (EFCM)--so that they have full interoperability.
Brocade's goal is to enhance EFCM to be a common software management platform for all of their products going forward.
IBM used the maximum port count in the name to provide some clue as to the size of the switch or director. The SAN16B-2 or the SAN32B-3 are switches that have a maximum of 16 and 32 ports. The SAN256B supports a maximumeight blades of your choosing.Two different types were supported for FC ports, a 16-port blade and a 32-port blade.If all eight were 32-port blades then the maximum was 256 ports, hence the name. But then Brocade began offering 48-port blades. Should IBM change the name? No, it decided to leave itthe SAN256B even though it can now have a maximum of 384 ports.
Not to confuse anyone, the SAN768B also has a maximum of 384 ports, in the same 14U dimensions, but with a special twist. Normally to connect two directors together you use up ports from each, in what are called "inter-switch links" (ISL).These are ports you are taking away from availability from the servers and storage controllers. The SAN768Boffers a new alternative called "inter-chassis links". Each SAN768B has two processing blades, and each has two ICL ports, so with just four two-meter (2m) cables, you get the equivalent of 128 FC 8 Gbps ISL links without using 128 individual ports on each side. That is like giving you 256 ports back for use with servers and storage!
Since IBM directors require 240 volt power, IBM TotalStorage SAN Cabinet C36 include power distribution units (PDUs). PDUs are just glorified power strips, but a new intelligent PDU (iPDU) option introduces additional intelligence to monitor energy consumption for customers looking to measure, and perhaps charge back, energy consumption to the rest of the business. You can stack two SAN768B in one cabinet, one on top of the other, and connected via ICLs, it wouldlook like one huge 768-port backbone.
As a backbone for your data center, the SAN768B is positioned for two emerging technologies:
8 Gbps Fibre Channel (FC)
The SAN768B is powerful enough to have 32-port blades run full speed on all ports off-blade without oversubscription. Oversubscription is an emotional topic.
Normally, blades (like switches) can handle all traffic at full speed without delays provided the in-bound and out-bound ports involved are all on the same blade. In a director, however, if you need to communicate from a port on one blade to a port on a different blade, it is possible that off-blade traffic might be constrained or delayed in its transit across the backplane.
On the SAN768B, both the 16-port and 32-port blades can run at full 8 Gbps speed, and the 48-port is exposed to oversubscription only if you have more than 32-ports running at full 8 Gbps transferring data off-blade concurrently.
The new 8 Gbps SFPs support auto-negotiation at N-1 and N-2 generation link speeds. This means that they will automatically slow down when communicating with 4Gpbs and 2 Gbps devices, but they cannot communicate with 1 Gbps devices. If you are still using 1 Gbps devices in your data center, you will need to use 4 Gbps SFPs (which also support 2 Gbps and 1 Gbps link speeds) to communicate with those older devices.
Basically, this new technology enables transport of Fibre Channel packets over 10 Gbps Ethernet links. This 10 Gbps Ethernet can also be used to carry traditional iSCSI and TCP/IP traffic. FCoE introduces new extensions to provide Fibre Channel characteristics, like being lossless, and offering consistent performance. The ANSI T11 team is driving FCoE as an open standard, and at the moment it is not fully baked. I suggest you don't buy any FCoE equipment prematurely, as pre-standard devices or host bus adapters could get you burned later when the standard is finalized.
The idea is that FCoE blades can be installed in a SAN768B along with traditional FC blades, allowing routing of traffic between traditional FC and new FCoE ports. Those who have invested in FCIP for long distance replication will be able to continue using either FC or FCoE inputs.
One of the big drivers of FCoE is IBM BladeCenter. Currently, most BladeCenter blades support both Ethernet and FC connectivity and are connected to both Ethernet and FC switches on the back of each BladeCenter chassis. With FCoE, we have the potential to run both FC and IP traffic across simpler all-Ethernet blades, connecting through all-Ethernet switches on the backs of each chassis.
For more information on the IBM SAN768B, see the [IBM Press Release]. For more detailson Brocade's strategy, here is an 8-page white paper on their[Data Center Fabric] vision.
A long time ago, perhaps in the early 1990s, I was an architect on the component known today as DFSMShsm on z/OS mainframe operationg system. One of my job responsibilities was to attend the biannual [SHARE conference to listen to the requirements of the attendees on what they would like added or changed to the DFSMS, and ask enough questions so that I can accurately present the reasoning to the rest of the architects and software designers on my team. One person requested that the DFSMShsm RELEASE HARDCOPY should release "all" the hardcopy. This command sends all the activity logs to the designated SYSOUT printer. I asked what he meant by "all", and the entire audience of 120 some attendees nearly fell on the floor laughing. He complained that some clever programmer wrote code to test if the activity log contained only "Starting" and "Ending" message, but no error messages, and skip those from being sent to SYSOUT. I explained that this was done to save paper, good for the environment, and so on. Again, howls of laughter. Most customers reroute the SYSOUT from DFSMS from a physical printer to a logical one that saves the logs as data sets, with date and time stamps, so having any "skipped" leaves gaps in the sequence. The client wanted a complete set of data sets for his records. Fair enough.
When I returned to Tucson, I presented the list of requests, and the immediate reaction when I presented the one above was, "What did he mean by ALL? Doesn't it release ALL of the logs already?" I then had to recap our entire dialogue, and then it all made sense to the rest of the team. At the following SHARE conference six months later, I was presented with my own official "All" tee-shirt that listed, and I am not kidding, some 33 definitions for the word "all", in small font covering the front of the shirt.
I am reminded of this story because of the challenges explaining complicated IT concepts using the English language which is so full of overloaded words that have multiple meanings. Take for example the word "protect". What does it mean when a client asks for a solution or system to "protect my data" or "protect my information". Let's take a look at three different meanings:
The first meaning is to protect the integrity of the data from within, especially from executives or accountants that might want to "fudge the numbers" to make quarterly results look better than they are, or to "change the terms of the contract" after agreements have been signed. Clients need to make sure that the people authorized to read/write data can be trusted to do so, and to store data in Non-Erasable, Non-Rewriteable (NENR) protected storage for added confidence. NENR storage includes Write-Once, Read-Many (WORM) tape and optical media, disk and disk-and-tape blended solutions such as the IBM Grid Medical Archive Solution (GMAS) and IBM Information Archive integrated system.
The second meaning is to protect access from without, especially hackers or other criminals that might want to gather personally-identifiably information (PII) such as social security numbers, health records, or credit card numbers and use these for identity theft. This is why it is so important to encrypt your data. As I mentioned in my post [Eliminating Technology Trade-Offs], IBM supports hardware-based encryption FDE drives in its IBM System Storage DS8000 and DS5000 series. These FDE drives have an AES-128 bit encryption built-in to perform the encryption in real-time. Neither HDS or EMC support these drives (yet). Fellow blogger Hu Yoshida (HDS) indicates that their USP-V has implemented data-at-rest in their array differently, using backend directors instead. I am told EMC relies on the consumption of CPU-cycles on the host servers to perform software-based encryption, either as MIPS consumed on the mainframe, or using their Powerpath multi-pathing driver on distributed systems.
There is also concern about internal employees have the right "need-to-know" of various research projects or upcoming acquisitions. On SANs, this is normally handled with zoning, and on NAS with appropriate group/owner bits and access control lists. That's fine for LUNs and files, but what about databases? IBM's DB2 offers Label-Based Access Control [LBAC] that provides a finer level of granularity, down to the row or column level. For example, if a hospital database contained patient information, the doctors and nurses would not see the columns containing credit card details, the accountants would not see the columnts containing healthcare details, and the individual patients, if they had any access at all, would only be able to access the rows related to their own records, and possibly the records of their children or other family members.
The third meaning is to protect against the unexpected. There are lots of ways to lose data: physical failure, theft or even incorrect application logic. Whatever the way, you can protect against this by having multiple copies of the data. You can either have multiple copies of the data in its entirety, or use RAID or similar encoding scheme to store parts of the data in multiple separate locations. For example, with RAID-5 rank containing 6+P+S configuration, you would have six parts of data and one part parity code scattered across seven drives. If you lost one of the disk drives, the data can be rebuilt from the remaining portions and written to the spare disk set aside for this purpose.
But what if the drive is stolen? Someone can walk up to a disk system, snap out the hot-swappable drive, and walk off with it. Since it contains only part of the data, the thief would not have the entire copy of the data, so no reason to encrypt it, right? Wrong! Even with part of the data, people can get enough information to cause your company or customers harm, lose business, or otherwise get you in hot water. Encryption of the data at rest can help protect against unauthorized access to the data, even in the case when the data is scattered in this manner across multiple drives.
To protect against site-wide loss, such as from a natural disaster, fire, flood, earthquake and so on, you might consider having data replicated to remote locations. For example, IBM's DS8000 offers two-site and three-site mirroring. Two-site options include Metro Mirror (synchronous) and Global Mirror (asynchronous). The three-site is cascaded Metro/Global Mirror with the second site nearby (within 300km) and the third site far away. For example, you can have two copies of your data at site 1, a third copy at nearby site 2, and two more copies at site 3. Five copies of data in three locations. IBM DS8000 can send this data over from one box to another with only a single round trip (sending the data out, and getting an acknowledgment back). By comparison, EMC SRDF/S (synchronous) takes one or two trips depending on blocksize, for example blocks larger than 32KB require two trips, and EMC SRDF/A (asynchronous) always takes two trips. This is important because for many companies, disk is cheap but long-distance bandwidth is quite expensive. Having five copies in three locations could be less expensive than four copies in four locations.
Fellow blogger BarryB (EMC Storage Anarchist) felt I was unfair pointing out that their EMC Atmos GeoProtect feature only protects against "unexpected loss" and does not eliminate the need for encryption or appropriate access control lists to protect against "unauthorized access" or "unethical tampering".
(It appears I stepped too far on to ChuckH's lawn, as his Rottweiler BarryB came out barking, both in the [comments on my own blog post], as well as his latest titled [IBM dumbs down IBM marketing (again)]. Before I get another rash of comments, I want to emphasize this is a metaphor only, and that I am not accusing BarryB of having any canine DNA running through his veins, nor that Chuck Hollis has a lawn.)
As far as I know, the EMC Atmos does not support FDE disks that do this encryption for you, so you might need to find another way to encrypt the data and set up the appropriate access control lists. I agree with BarryB that "erasure codes" have been around for a while and that there is nothing unsafe about using them in this manner. All forms of RAID-5, RAID-6 and even RAID-X on the IBM XIV storage system can be considered a form of such encoding as well. As for the amount of long-distance bandwidth that Atmos GeoProtect would consume to provide this protection against loss, you might question any cost savings from this space-efficient solution. As always, you should consider both space and bandwidth costs in your total cost of ownership calculations.
Of course, if saving money is your main concern, you should consider tape, which can be ten to twenty times cheaper than disk, affording you to keep a dozen or more copies, in as many time zones, at substantially lower cost. These can be encrypted and written to WORM media for even more thorough protection.
(Note: The following paragraphs have been updated to clarify the performance tests involved.)
This time, IBM breaks the 1 million IOPS barrier, achieved by running a test workload consisting of a 70/30 mix of random 4K requests. That is 70 percent reads, 30 percent writes, with 4KB blocks. The throughput achieved was 3.5x times that obtained by running the identical workload on the fastest IBM storage system today (IBM System Storage SAN Volume Controller 4.3),
and an estimated EIGHT* times the performance of EMC DMX. With an average response time under 1 millisecond, this solution would be ideal for online transaction processing (OLTP) such as financial recordings or airline reservations.
(*)Note: EMC has not yet published ANY benchmarks of their EMC DMX box with SSD enterprise flash drives (EFD). However, I believe that the performance bottleneck is in their controller and not the back-end SSD or FC HDD media, so I have givenEMC the benefit of the doubt and estimated that their latest EMC DMX4 is as fast as an[IBMDS8300 Turbo] with Fibre Channel drives. If or when EMC publishes benchmarks, the marketplace can make more accurate comparisons. Your mileage may vary.
IBM used 4 TB of Solid State Disk (SSD) behind its IBM SAN Volume Controller (SVC) technology to achieve this amazing result. Not only does this represent a significantly smaller footprint, but it uses only 55 percent of the power and cooling.
The SSD drives are made by [Fusion IO] and are different than those used by EMC made by STEC.
The SVC addresses the one key problem clients face today with competitive disk systems that support SSD enterprise flash drives: choosing what data to park on those expensive drives? How do you decide which LUNs, which databases, or which files should be permanently resident on SSD? With SVC's industry-leading storage virtualization capability, you are not forced to decide. You can move data into SSD and back out again non-disruptively, as needed to meet performance requirements. This could be handy for quarter-end or year-end processing, for example.
(FTC Disclosure: I do not work or have any financial investments in ENC Security Systems. ENC Security Systems did not paid me to mention them on this blog. Their mention in this blog is not an endorsement of either their company or any of their products. Information about EncryptStick was based solely on publicly available information and my own personal experiences. My friends at ENC Security Systems provided me a full-version pre-loaded stick for this review.)
The EncryptStick software comes in two flavors, a free/trial version, and the full/paid version. The free trial version has [limits on capacity and time] but provides enough glimpse of the product to decide before you buy the full version. You can download the software yourself and put in on your own USB device, or purchase the pre-loaded stick that comes with the full-version license.
Whichever you choose, the EncryptStick offers three nice protection features:
Encryption for data organized in "storage vaults", which can be either on the stick itself, or on any other machine the stick is connected to. That is a nice feature, because you are not limited to the capacity of the USB stick.
Encrypted password list for all your websites and programs.
A secure browser, that prevents any key-logging or malware that might be on the host Windows machine.
I have tried out all three functions and everything works as advertised. However, there is always room for improvement, so here are my suggestions.
The first problem is that the pre-loaded stick looks like it is worth a million dollars. It is in a shiny bronze color with "EncryptStick" emblazoned on it. This is NOT subtle advertising! This 8GB capacity stick looks like it would be worth stealing solely on being a nice piece of jewelry, and then the added bonus that there might be "valuable secrets" just makes that possibility even more likely.
If you want to keep your information secure, it would help to have "plausible deniability" that there is nothing of value on a stick. Either have some corporate logo on it, of have the stick look like a cute animal, like these pig or chicken USB sticks.
It reminds me how the first Apple iPod's were in bright [Mug-me White]. I use black headphones with my black iPod to avoid this problem.
Of course, you can always install the downloadable version of EncryptStick software onto a less conspicuous stick if you are concerned about theft. The full/paid version of EncryptStick offers an option for "lost key recovery" which would allow you to backup the contents of the stick and be able to retrieve them on a newly purchased stick in the event your first one is lost or stolen.
Imagine how "unlucky" I felt when I notice that I had lost my "rabbits feet" on this cute animal-themed USB stick.
I sense trouble for losing the cap on my EncryptStick as well. This might seem trivial, but is a pet-peeve of mine that USB sticks should plan for this. Not only is there nothing to keep the cap on (it slides on and off quite smoothly), but there is no loop to attach the cap to anything if you wanted to.
Since then, I got smart and try to look for ways to keep the cap connected. Some designs, like this IBM-logoed stick shown above, just rotate around an axle, giving you access when you need it, and protection when it is folded closed.
Alternatively, get a little chain that allows you to attach the cap to the main stick. In the case of the pig and chicken, the memory section had a hole pre-drilled and a chain to put through it. I drilled an extra hole in the cap section of each USB stick, and connected the chain through both pieces.
(Warning: Kids, be sure to ask for assistance from your parents before using any power tools on small plastic objects.)
The EncryptStick can run on either Microsoft Windows or Mac OS. The instructions indicate that you can install both versions of download software onto a single stick, so why not do that for the pre-loaded full version? The stick I have had only the Windows version pre-loaded. I don't know if the Windows and Mac OS versions can unlock the same "storage vaults" on the stick.
Certainly, I have been to many companies where either everyone runs Windows or everyone runs Mac OS. If the primary target audience is to use this stick at work in one of those places, then no changes are required. However, at IBM, we have employees using Windows, Mac OS and Linux. In my case, I have all three! Ideally, I would like a version of EncryptStick that I could take on trips with me that would allow me to use it regardless of the Operating System I encountered.
Since there isn't a Linux-version of EncryptStick software, I decided to modify my stick to support booting Linux. I am finding more and more Linux kiosks when I travel, especially at airports and high-traffic locations, so having a stick that works both in Windows or Linux would be useful. Here are some suggestions if you want to try this at home:
Use fdisk to change the FAT32 partition type from "b" to "c". Apparently, Grub2 requires type "c", but the pre-loaded EncryptStick was set to "b". The Windows version of EncryptStick> seems to work fine in either mode, so this is a harmless change.
Install Grub2 with "grub-install" from a working Linux system.
Once Grub2 is installed, you can boot ISO images of various Linux Rescue CDs, like [PartedMagic] which includes the open-source [TrueCrypt] encryption software that you could use for Linux purposes.
This USB stick could also be used to help repair a damaged or compromised Windows system. Consider installing [Ophcrack] or [Avira].
Certainly, 8GB is big enough to run a full Linux distribution. The latest 32-bit version of [Ubuntu] could run on any 32-bit or 64-bit Intel or AMD x86 machine, and have enough room to store an [encrypted home directory].
Since the stick is formatted FAT32, you should be able to run your original Windows or Mac OS version of EncryptStick with these changes.
Depending on where you are, you may not have the luxury to reboot a system from the USB memory stick. Certainly, this may require changes to the boot sequence in the BIOS and/or hitting the right keys at the right time during the boot sequence. I have been to some "Internet Cafes" that frown on this, or have blocked this altogether, forcing you to boot only from the hard drive.
Well, those are my suggestions. Whether you go on a trip with or without your laptop, it can't hurt to take this EncryptStick along. If you get a virus on your laptop, or have your laptop stolen, then it could be handy to have around. If you don't bring your laptop, you can use this at Internet cafes, hotel business centers, libraries, or other places where public computers are available.
In my post yesterday [Spreading out the Re-Replication process], fellow blogger BarryB [aka The Storage Anarchist]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
JBOD: 4 drives
JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
RAID-10: 8 drives
RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
RAID-5: 5 drives
This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
RAID-6: 6 drives
This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
Continuing my coverage of the IBM Dynamic Infrastructure Executive Summit at the Fairmont Resort in Scottsdale, Arizona, we had a day full main-tent sessions. Here is a quick recap of the sessions presented in the morning.
Leadership and Innovation on a Smarter Planet
Todd Kirtley, IBM General Manager of the western United States, kicked off the day. He explained that we are now entering the Decade of Smart: smarter healthcare, smarter energy, smarter traffic systems, and smarter cities, to name a few. One of those smarter cities is Dubuque, Iowa, nicknamed the Masterpiece of the Mississippi river. Mayor Roy Boul of Dubuque spoke next on his testimonial on working with IBM. I have never been to Dubuque, but it looks and sounds like a fun place to visit. Here is the [press release] and a two-minute [video].
Smarter Systems for a Smarter Planet
Tom Rosamillia, IBM General Manager of the System z mainframe platform, presented on smarter systems. IBM is intentionally designing integrated systems to redefine performance and deliver the highest possible value for the least amount of resource. The five key focus areas were:
Enabling massive scale
Organizing vast amounts of data
Turning information into insight
Increasing business agility
Managing risk, security and compliance
The Future of Systems
Ambuj Goyal, IBM General Manager of Development and Manufacturing, presented the future of systems. For example, reading 10 million electricity meters monthly is only 120 million transactions per year, but reading them daily is 3.65 billion, and reading them every 15 minutes will result in over 350 billion transactions per year. What would it take to handle this? Beyond just faster speeds and feeds, beyond consolidation through virtualization and multi-core systems, beyond pre-configured fit-for-purpose appliances, there will be a new level for integrated systems. Imagine a highly dense integration with over 3000 processors per frame, over 400 Petabytes (PB) of storage, and 1.3 PB/sec bandwidth. Integrating software, servers and storage will make this big jump in value possible.
POWERing your Planet
Ross Mauri, IBM General Manager of Power Systems, presented the latest POWER7 processor server product line. The IBM POWER-based servers can run any mix of AIX, Linux and IBM i (formerly i5/OS) operating system images. Compared to the previous POWER6 generation, POWER7 are four times more energy efficient, twice the performance, at about the same price. For example, an 8-socket p780 with 64 cores (eight per socket) and 256 threads (4 threads per core) had a record-breaking 37,000 SAP users in a standard SD 2-tier benchmark, beating out 32-socket and 64-socket M9000 SPARC systems from Oracle/Sun and 8-socket Nehalem-EX Fujitsu 1800E systems. See the [SAP benchmark results] for full details. With more TPC-C performance per core, the POWER7 is 4.6 times faster than HP Itanium and 7.5 times faster than Oracle Sun T5440.
This performance can be combined with incredible scalability. IBM's PowerVM outperforms VMware by 65 percent and provides features like "Live Partition Mobility" that is similar to VMware's VMotion capability. IBM's PureScale allows DB2 to scale out across 128 POWER servers, beating out Oracle RAC clusters.
The final speaker in the morning was Greg Lotko, IBM Vice President of Information Management Warehouse solutions. Analytics are required to gain greater insight from information, and this can result in better business outcomes. The [IBM Global CFO Study 2010] shows that companies that invest in business insight consistently outperform all other enterprises, with 33 percent more revenue growth, 32 percent more return on invested (ROI) capital, and 12 times more earnings (EBITDA). Business Analytics is more than just traditional business intelligence (BI). It tries to answer three critical questions for decision makers:
What is happening?
Why is it happening?
What is likely to happen in the future?
The IBM Smart Analytics System is a pre-configured integrated system appliance that combines text analytics, data mining and OLAP cubing software on a powerful data warehouse platform. It comes in three flavors: Model 5600 is based on System x servers, Model 7600 based on POWER7 servers, and Model 9600 on System z mainframe servers.
IBM has over 6000 business analytics and optimization consultants to help clients with their deployments.
While this might appear as "Death by Powerpoint", I think the panel of presenters did a good job providing real examples to emphasize their key points.
Wrapping up my series on a [Laptop for Grandma], I finally have something that I think meets all of my requirements! Special thanks to Guidomar and the rest of my readers who sent in suggestions!
I could have called this series "The Good, the Bad, and the Ugly". The [Cloud-oriented choices] weren't bad per se, but expected persistent Internet connection. The [Low-RAM choices] were not ugly per se, but had limited application options. The ones below were good, in that they helped me decide what would be just right for grandma.
Linux Mint 9
One of my readers, Guidomar, suggested Linux Mint Xfce. At LinuxFest Northwest 2012, Bryan Lunduke indicated that [Linux Mint] is the fastest growing Linux in popularity. You can watch his 43-minute presentation of [Why Linux Sucks!] on YouTube.
The latest version is Mint 14, but that has grown so big it has to be installed on a DVD, as it will no longer fit on a 700MB CD-ROM. Since I don't have a DVD drive on this Thinkpad R31, I dropped down to the latest Gnome edition that did fit on a LiveCD, which was Mint 9.
(In retrospect, I could have used the [PLoP Boot Manager CD], and installed the latest Linux Mint 14 from USB memory stick! My concern was that if a distribution didn't fit on a CD-ROM, it was expecting a more modern computer overall, and thus would probably require more than 384MB or RAM as well.)
Linux Mint is actually a variant of Ubuntu, which means that it can tap into the thousands of applications already available. Mint 9 is based on Ubuntu 10.04 LTS.
One of the nice features of Linux Mint is that there are versions with full [Codecs] installed. A codec is a coder/decoder software routine that can convert a digital data stream or signal, such as for audio or video data. Many formats are proprietary, so codecs are generally not open source, and often not included in most Linux distros. They can be installed manually by the Linux administrator. Windows and Mac OS are commercially sold and don't have this problem, as Microsoft and Apple take care of all the licensing issues behind the scenes.
The installation went smooth. It would have gladly set up a dual-boot with Windows for me, but instead I opted to wipe the disk clean and install fresh for each Linux distribution I tried.
Running it was a different matter. The screen would go black and crash. There just wasn't enough memory.
Since [Peppermint OS] was partially based on Lubuntu, I thought I would give [Lubuntu 12.04] a try. The difference is that Peppermint OS is based on Xfce (as is Xubuntu), but Lubuntu claims to have a smaller memory footprint using Lightweight X11 Desktop Environment (LXDE). This version claims to run in 384MB, which is what I have on grandma's Thinkpad R31.
There are two installers. The main installer requires more than 512MB to run, so I used the alternate text-based Installer-only CD, which needs only 192MB.
The LXDE GUI is simple and straightforward. As with Peppermint OS, I did have to install the Codec plugins. However, the time-to-first-note was less than two minutes, so we can count this as a success!
Linux Mint 12 LXDE edition
Circling back to Linux Mint, I realized that my problem up above was chosing the wrong edition. Apparently, Linux Mint comes in various editions, the main edition I had selected was based on Gnome which requires at least 512MB of RAM.
Other editions are based on KDE, xFCE and LXDE. Linux Mint 9 LXDE requires only 192MB of RAM, and the newer Linux Mint 12 LXDE requires only 256MB. I choose the latter, and the install went pretty much the same as Mint and Lubuntu above.
The music player that comes pre-installed is called [Exaile], which supports playlists, audio CDs, and a variety of other modern features, so no reason to install Rhythmbox or anything else. Grandma can even rip her existing audio CDs to import her music into MP3 format. Time-to-first-note was about two minutes.
The best part: the OS only takes up about 4GB of disk, leaving about 15GB for MP3 music files!
Lubuntu and Linux Mint LXDE were similar, but I decided to go with the latter because I like that they do not force version upgrades. This is a philosophical difference. Ubuntu likes to keep everyone on the latest supported releases, so will often remind you its time to upgrade. Linux Mint prefers to take an if-it-aint-broke-don't-fix-it approach that will be less on-going maintenance for me.
A few finishing touches to make the system complete:
A nice wallpaper from [InterfaceLift]. This website has high-res photography that are just stunning.
Power management with screen-saver settings to a nice pink background with white snowflakes falling.
A small collection of her MP3 music pre-loaded so that she would have something to listen to while she learns how to rip CDs and copy over the rest of her music.
Icons on the main desktop for Exaile, My Computer, Home Directory, and the Welcome Screen.
Larger Font size, to make it easier to read.
Update settings that only look for levels "1" and "2". There are five levels, but "1" and "2" are considered the safest, tested versions. Also, an update is only done if it does not involve installing or removing other packages. This should offer some added stability.
I considered installing [ClamAV] for anti-virus protection, but since this laptop will not be connected to the Internet, I decided not to burn up CPU cycles. I also considered installing [Team Viewer] which would allow me remote access to her system if anything should every fail. However, since she does not have Wi-Fi at home, and lives only a few minutes across town, I decided to leave this off.
Once again, I want to thank all of my readers for their suggestions! I learned quite a lot on this journey, and am glad that I have something that I am proud to present to grandma: boots quickly enough, simple to use, and does not require on-going maintenance!
Well it's Tuesday again, and you know what that means.. IBM announcements! Today, IBM announces that next Monday marks the 60th anniversary of first commercial digital tape storage system! I am on the East coast this week visiting clients, but plan to be back in Tucson in time for the cake and fireworks next Monday.
1925 - masking tape (which 3M sold under its newly announced Scotch® brand)
1930 - clear cellulose-based tape (today, when people say Scotch tape, they usually are referring to the cellulose version)
1935 - Allgemeine Elektrizitatsgesellschaft (AEG) presents Magnetophon K1, audio recording on analog tape
1942 - Duct tape
1947 - Bing Crosby adopts audio recording for his radio program. This eliminated him doing the same program live twice per day, perhaps the first example of using technology for "deduplication".
According to the IBM Archives the [IBM 726 tape drive was formally announced May 21, 1952]. It was the size of a refrigerator, and the tape reel was the size of a large pizza. The next time you pull a frozen pizza from your fridge, you can remember this month's celebration!
When I first joined IBM in 1986, there were three kinds of IBM tape. The round reel called 3420, and the square cartridge called 3480, and the tubes that contained a wide swath of tape stored in honeycomb shelves called the [IBM 3850 Mass Storage System].
My first job at IBM was to work on DFHSM, which was specifically started in 1977 to manage the IBM 3850, and later renamed to the DFSMShsm component of the DFSMS element of the z/OS operating system. This software was instrumental in keeping disk and tape at high 80-95 percent utilization rates on mainframe servers.
While visiting a client in Detroit, the client loved their StorageTek tape automation silo, but didn't care for the StorageTek drives inside were incompatible with IBM formats. They wanted to put IBM drives into the StorageTek silos. I agreed it was a good idea, and brought this back to the attention of development. In a contentious meeting with management and engineers, I presented this feedback from the client.
Everyone in the room said IBM couldn't do that. I asked "Why not?" The software engineers I spoke to already said they could support it. With StorageTek at the brink of Chapter 11 bankruptcy, I argued that IBM drives in their tape automation would ease the transition of our mainframe customers to an all-IBM environment.
Was the reason related to business/legal concerns, or was their a hardware issue? It turned out to be a little of both. On the business side, IBM had to agree to work with StorageTek on service and support to its mutual clients in mixed environments. On the technical side, the drive had to be tilted 12 degrees to line up with the robotic hand. A few years later, the IBM silo-compatible 3592 drive was commercially available.
Rather than put StorageTek completely out of business, it had the opposite effect. Now that IBM drives can be put in StorageTek libraries, everyone wanted one, basically bringing StorageTek back to life. This forced IBM to offer its own tape automation libraries.
In 1993, I filed my first patent. It was for the RECYCLE function in DFHSM to consolidate valid data from partial tapes to fresh new tapes. Before my patent, the RECYCLE function selected tapes alphabetically, by volume serial (VOLSER). My patent evaluated all tapes based on how full they were, and sorted them least-full to most-full, to maximize the return of cartridges.
Different tape cartridges can hold different amounts of data, especially with different formats on the same media type, with or without compression, so calculating the percentage full turned out to be a tricky algorithm that continues to be used in mainframe environments today.
The patent was popular for cross-licensing, and IBM has since filed additional patents for this invention in other countries to further increase its license revenue for intellectual property.
In 1997, IBM launched the IBM 3494 Virtual Tape Server (VTS), the first virtual tape storage device, blending disk and tape to optimal effect. This was based off the IBM 3850 Mass Storage Systems, which was the first virtual disk system, that used 3380 disk and tape to emulate the older 3350 disk systems.
In the VTS, tape volume images would be emulated as files on a disk system, then later moved to physical tape. We would call the disk the "Tape Volume Cache", and use caching algorithms to decide how long to keep data in cache, versus destage to tape. However, there were only a few tape drives, and sometimes when the VTS was busy, there were no tape drives available to destage the older images, and the cache would fill up.
I had already solved this problem in DFHSM, with a function called pre-migration. The idea was to pre-emptively copy data to tape, but leave it also on disk, so that when it needed to be destaged, all we had to do was delete the disk copy and activate the tape copy. We patented using this idea for the VTS, and it is still used in the successor models of IBM Sysem Storage TS7740 virtual tape libraries today.
Today, tape continues to be the least expensive storage medium, about 15 to 25 times less expensive, dollar-per-GB, than disk technologies. A dollar of today's LTO-5 tape can hold 22 days worth of MP3 music at 192 Kbps recording. A full TS1140 tape cartridge can hold 2 million copies of the book "War and Peace".
(If you have not read the book, Woody Allen took a speed reading course and read the entire novel in just 20 minutes. He summed up the novel in three words: "It involves Russia." By comparison, in the same 20 minutes, at 650MB/sec, the TS1140 drive can read this novel over and over 390,000 times.)
If you have your own "war stories" about tape, I would love to hear them, please consider posting a comment below.