Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Tony Pearson is a Master Inventor and Senior Software Engineer for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
Wrapping up this week's exploration on disk system performance, today I willcover the Storage Performance Council (SPC) benchmarks, and why I feel they are relevant to help customers make purchase decisions. This all started to address a comment from EMC blogger Chuck Hollis, who expressed his disappointment in IBM as follows:
You've made representations that SPC testing is somehow relevant to customers' environments, but offered nothing more than platitudes in support of that statement.
Apparently, while everyone else in the blogosphere merely states their opinions and moves on,IBM is held to a higher standard. Fair enough, we're used to that.Let's recap what we covered so far this week:
Monday, I explained how seemingly simple questions like "Which is the tallestbuilding?" or "Which is the fastest disk system?" can be steeped in controversy.
Tuesday, I explored what constitutes a disk system. While there are special storage systemsthat include HDD that offer tape-emulation, file-oriented access, or non-erasable non-rewriteable protection,it is difficult to get apples-to-apples comparisions with storage systems that don't offer these special features.I focused on the majority of general-purpose disk systems, those that are block-oriented, direct-access.
Today, I will explore ways to apply these metrics to measure and compare storageperformance.
Let's take, for example, an IBM System Storage DS8000 disk system. This has a controller thatsupports various RAID configurations, cache memory, and HDD inside one or more frames.Engineers who are testing individual components of this system might run specifictypes of I/O requests to test out the performance or validate certain processing.
100% read-hit, this means that all the I/O requests are to read data expectedto be in the cache.
100% read-miss, this means that all the I/O requests are to read data expectedNOT to be in the cache, and must go fetch the data from HDD.
100% write-hit, this means that all the I/O requests are to write data into cache.
100% write-miss, this means that all the I/O requests are to bypass the cache,and are immediately de-staged to HDD. Depending on the RAID configuration, this can result in actually reading or writing several blocks of data on HDD to satisfy thisI/O request.
Known affectionately in the industry as the "four corners" test, because you can show them on a box, with writes on the left, reads on the right,hits on the top, and misses on the bottom.Engineers are proud of these results, but these workloads do notreflect any practical production workload. At best, since all I/O requests are oneof these four types, the four corners provide an expectation range from the worst performance (most often write-missin the lower left corner)and the best performance (most often read-hit in the upper right corner) you might get with a real workload.
To understand what is needed to design a test that is more reflective of real business conditions,let's go back to yesterday's discussion of fuel economy of vehicles, with mileage measured in miles per gallon.The How Stuff Works websiteoffers the following description for the two measurements taken by the EPA:
The "city" program is designed to replicate an urban rush-hour driving experience in which the vehicle is started with the engine cold and is driven in stop-and-go traffic with frequent idling. The car or truck is driven for 11 miles and makes 23 stops over the course of 31 minutes, with an average speed of 20 mph and a top speed of 56 mph.
The "highway" program, on the other hand, is created to emulate rural and interstate freeway driving with a warmed-up engine, making no stops (both of which ensure maximum fuel economy). The vehicle is driven for 10 miles over a period of 12.5 minutes with an average speed of 48 mph and a top speed of 60 mph.
Why two different measurements? Not everyone drives in a city in stop-and-go traffic. Having only one measurement may not reflect the reality that you may travel long distances on the highway. Offering both city and highway measurements allows the consumers to decide which metric relates closer to their actual usage.
Should you expect your actual mileage to be the exact same as the standardized test?Of course not. Nobody drives exactly 11 miles in the city every morning with 23 stops along the way,or 10 miles on the highway at the exact speeds listed.The EPA's famous phrase "your mileage may vary" has been quickly adopted into popular culture's lexicon. All kinds of factors, like weather, distance, anddriving style can cause people to get better or worse mileage than thestandardized tests would estimate.
Want more accurate results that reflect your driving pattern, in specific conditions that you are most likely to drive in? You could rentdifferent vehicles for a week and drive them around yourself, keeping track of whereyou go, and how fast you drove, and how many gallons of gas you purchased, so thatyou can then repeat the process with another rental, and so on, and then use yourown findings to base your comparisons. Perhaps you find that your results are always20% worse than EPA estimates when you drive in the city, and 10% worse when you driveon the highway. Perhaps you have many mountains and hills where you drive, you drive too fast, you run the Air Conditioner too cold, or whatever.
If you did this with five or more vehicles, and ranked them best to worstfrom your own findings, and also ranked them best to worst based on the standardizedresults from the EPA, you likely will find the order to be the same. The vehiclewith the best standardized result will likely also have the best result from your ownexperience with the rental cars. The vehicle with the worst standardized result willlikely match the worst result from your rental cars.
(This will be one of my main points, that standardized estimates don't have to be accurate to beuseful in making comparisons. The comparisons and decisions you would make with estimatesare the same as you would have made with actual results, or customized estimates based on current workloads. Because the rankings are in the same order, they are relevant and useful for making decisions based on those comparisons.)
Most people shopping around for a new vehicle do not have the time or patience to do this with rental cars. Theycan use the EPA-certified standardized results to make a "ball-park" estimate on how much they will spendin gasoline per year, decide only on cars that might go a certain distancebetween two cities on a single tank of gas, or merely to provide ranking of thevehicles being considered. While mileage may not be the only metric used in making a purchase decision, it can certainly be used to help reduce your consideration setand factor in with other attributes, like number of cup-holders, or leather seats.
In this regard, the Storage Performance Council has developed two benchmarks that attempt to reflect normal business usage, similar to "City" and "Highway" driving measurements.
SPC-1 consists of a single workload designed to demonstrate the performance of a storage subsystem while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.
SPC-2 consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.
Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.
The SPC-2 benchmark was added when people suggested that not everyone runs OLTP anddatabase transactional update workloads, just as the "Highway" measurement was addedto address the fact that not everyone drives in the City.
If you are one of the customers out there willing to spend the time and resources to do your own performance benchmarking, either at your own data center, or with theassistance of a storage provider, I suspect most, if not all, the major vendors(including IBM, EMC and others), and perhaps even some of the smaller start-ups, would be glad to work with you.
If you want to gather performance data of your actual workloads, and use this to estimate how your performance might be with a new or different storage configuration, IBMhas tools to make these estimates, and I suspect (again) that most, if not all, of theother storage vendors have developed similar tools.
For the rest of you who are just looking to decide which storage vendors to invite on your next RFP, and which products you might like to investigate that matchthe level of performance you need for your next project or application deployment,than the SPC benchmarks might help you with this decision. If performance is importantto you, factor these benchmark comparisons with the rest of the attributes you arelooking for in a storage vendor and a storage system.
In my opinion, I feel that for some people, the SPC benchmarks provide some value in this decision making process. They are proportionally correct, in that even ifyour workload gets only a portion of the SPC estimate, that storage systems withfaster benchmarks will provide you better performance than storage systems with lower benchmark results. That is why I feel they can be relevant in makingvalid comparisons for purchase decisions.
Hopefully, I have provided enough "food for thought"on this subject to support why IBM participates in the Storage Performance Council, why the performance of the SAN Volume Controller can be compared to the performanceof other disk systems, and why we at IBM are proud of the recent benchmark results in our recent press release.
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
Seagate (including Maxstor they acquired)
Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
It's Tuesday, and you know what that means? IBM Announcements! This week I am in beautiful Orlando, Florida for the [IBM Systems Technical University] conference.
This week, IBM announced its latest tape offerings for the seventh generation of Linear Tape Open (LTO-7), providing huge gains in performance and capacity.
For capacity, the new LTO-7 cartridges can hold up to 6TB native capacity, or 15TB effective capacity with 2.5x compression that for typical data. That is 2.4x larger than the 2.5TB catridges available with LTO-6. Performance is also nearly doubled, with a native throughput of 315 MB/sec, or effective 780 MB/sec effective capacity with 2.5x compression. The LTO consortium, of which IBM is a founding member, has published the roadmap for LTO generations to LTO-8, LTO-9 and LTO-10.
IBM will offer both half-height and full-height LTO-7 tape drives. All the features you love from LTO-6 like WORM, partitioning and Encryption carry forward. These drives will be supported on a variety of distributed operating systems, including Linux on z System mainframes, and the IBM i platform on POWER Systems.
The Linear Tape File System (LTFS) can be used to treat LTO-7 cartridges in much the same way as Compact Discs or USB memory sticks, allowing one person to create conent on an LTO-7 tape cartridge, and pass that cartridge to the next employee, or to another company. LTFS is also the basis for IBM Spectrum Archive that allows tape data to be part of a global namespace with IBM Spectrum Scale.
LTO-7 will be supported on the TS2900 auto-loader, as well as all of IBM's tape libraries: TS3100, TS3200, TS3310, TS3500 and TS4500. You can connect up to 15 TS3500 tape libraries together with shuttle connectors, for a maximum capacity of 2,700 drives serving 300,000 cartridges, for a maximum capacity of 1.8 Exabytes of data in a single system environment.
In addition to LTO-7 support, the IBM TS4500 tape library was also enchanced. You can now grow it up to 18 frames, and have up to 128 drives serving 23,170 cartridges, for a maximum capacity of 139 PB of data. You can now also intermix LTO and 3592 frames in the same TS4500 tape library.
For comptability, LTO-7 drives can read existing LTO-5 and LTO-6 tape cartridges, and can write to LTO-6 media, to help clients with transition.
Did you miss IBM Pulse 2013 this week? I wasn't there either, having scheduled visits with clients in Washington DC this week, only to have those meetings cancelled due to the [U.S. sequestration cuts].
Fortunately, there are plenty of videos and materials to review from the event. Here's a [12-minute video] interview between Laura DuBois, Program VP of Storage for industry analyst firm [IDC], and fellow IBM executive Steve "Woj" Wojtowecz, VP of Tivoli Storage and Networking Software.
(Update: Apparently, IBM had not secured re-distribution rights from IDC to post this video prior to my blog post. IBM now has full permission to distribute. My apologies for any inconvenience last week.)
The two discuss client opportunities and requirements for storage clouds and compute clouds. Client cloud storage requirements include backup and archive clouds, file storage clouds, and storage that supports compute cloud environments.
If you store your VMware bits on external SAN or NAS-based disk storage systems, this post is for you. The subject of the post, VM Volumes, is a potential storage management game changer!
Fellow blogger Stephen Foskett mentioned VM Volumes in his [Introducing VMware vSphere Storage Features] presentation at IBM Edge 2012 conference. His session on VMware's storage features included VMware APIs for Array Integration (VAAI), VMware Array Storage Awareness (VASA), vCenter plug-ins, and a new concept he called "vVol", now more formally known as VM Volumes. This post provides a follow-up to this, describing the VM Volumes concepts, architecture, and value proposition.
"VM Volumes" is a future architecture that VMware is developing in collaboration with IBM and other major storage system vendors. So far, very little information about VM Volumes has been released. At VMworld 2012 Barcelona, VMware highlights VM Volumes for the first time and IBM demonstrates VM Volumes with the IBM XIV Storage System (more about this demo below). VM Volumes is worth your attention -- when it becomes generally available, everyone using storage arrays will have to reconsider their storage management practices in a VMware environment -- no exaggeration!
But enough drama. What is this all about?
(Note: for the sake of clarity, this post refers to block storage only. However, the VM Volumes feature applies to NAS systems as well. Special thanks to Yossi Siles and the XIV development team for their help on this post!)
The VM Volumes concept is simple: VM disks are mapped directly to special volumes on a storage array system, as opposed to storing VMDK files on a vSphere datastore.
The following images illustrate the differences between the two storage management paradigms.
You may still be asking yourself: bottom line, how will I benefit from VM Volumes?
Well, take a VM snapshot for example. With VM Volumes, vSphere can simply offload the operation by invoking a hardware snapshot of the hardware volume. This has significant implications:
VM-Granularity: Only the right VMs are copied (with datastores, backing up or cloning individual-VM portions of hardware snapshot of a datastore would require more complex configuration, tools and work)
Hardware Offload: No ESXi server resources are consumed
XIV advantage: With XIV, snapshots consume no space upfront and are completed instantly.
Here's the first takeaway: With VM Volumes, advanced storage services (which cost a lot when you buy a storage array), will become available at an individual VM level. In a cloud world, this means that applications can be provisioned easily with advanced storage services, such as snapshots and mirroring.
Now, let's take a closer look at another relevant scenario where VM Volumes will make a lot of difference - provisioning an application with special mirroring requirements:
VM Volumes case: The application is ordered via the private cloud portal. The requestor checks a box requesting an asynchronous mirror. He changes the default RPO for his needs. When the request is submitted, the process wraps up automatically: Volumes are created on one of the storage arrays, configured with a mirror and RPO exactly as specified. A few minutes later, the requestor receives an automatic mail pointing to the application virtual machine.
Datastores case #1: As may be expected, a datastore that is mirrored with the special RPO does not exist. As a result, the automated workflow sets a pending status on the request, creates an urgent ticket to a VMware administrator and aborts. When the VMware admin handles that ticket, she re-assigns the ticket to the storage administrator, asking for a new volume which is mirrored with the special RPO, and mapped to the right ESXi cluster. The next day, the volume is created; the ticket is re-assigned to the storage admin, with the new LUN being pointed to. The VMware administrator follows and creates the datastore on top of it. Since the automated workflow was aborted, the admin re-assigns the ticket to the cloud administrator, who sometime later completes the application provisioning manually.
Datastores case #2: Luckily for the requestor, a datastore that is mirrored with the special RPO does exist. However, that particular datastore is consuming space from a high performance XIV Gen3 system with SSD caching, while the application does not require that level of performance, so the workflow requires a storage administrator approval. The approval is given to save time, but the storage administrator opens a ticket for himself to create a new volume on another array, as well as a follow-up ticket for the VMware admin to create a new datastore using the new volume and migrate the application to the other datastore. In this case, provisioning was relatively rapid, but required manual follow up, involving the two administrators.
Here's the second takeaway: With VM Volumes, management is simplified, and end-to-end automation is much more applicable. The reason is that there are no datastores. Datastores physically group VMs that may otherwise be totally unrelated, and require close coordination between storage and VMware administrators.
Now, the above mainly focuses on the VMware or cloud administrator perspective. How does VM Volumes impact storage management?
VM's are the new hosts: Today, storage administrators have visibility of physical hosts in their management environment. In a non-virtualized environment, this visibility is very helpful. The storage administrator knows exactly which applications in a data center are storage-provisioned or affected by storage management operations because the applications are running on well-known hosts. However, in virtualized environments the association of an application to a physical host is temporary. To keep at least the same level of visibility as in physical environments, VMs should become part of the storage management environment, like hosts. Hosts are still interesting, for example to manage physical storage mapping, but without VM visibility, storage administrators will know less about their operation than they are used to, or need to. VM Volumes enables such visibility, because volumes are provided to individual VMs. The XIV VM Volumes demonstration at VMworld Barcelona, although experimental, shows a view of VM volumes, in XIV's management GUI.
Here's a screenshot:
That's not all!
Storage Profiles and Storage Containers: A Storage Profile is a vSphere specification of a set of storage services. A storage profile can include properties like thin or thick provisioning, mirroring definition, snapshot policy, minimum IOPS, etc.
Storage administrators define a portfolio of supported storage services, maintained as a set of storage profiles, and published (via VASA integration) to vSphere.
VMware or cloud administrators define the required storage profiles for specific applications
VMware and storage administrators need to coordinate the typical storage requirements and the automatically-available storage services. When a request to provision an application is made, the associated storage profiles are matched against the published set of available storage profiles. The matching published profiles will be used to create volumes, which will be bound to the application VMs. All that will happen automatically.
Note that when a VM is created today, a datastore must be specified. With VM Volumes, a new management entity called Storage Container (also known as Capacity Pool) replaces the use of datastore as a management object. Each Storage Container exposes a subset of the available storage profiles, as appropriate. The storage container also has a capacity quota.
Here are some more takeaways:
New way to interface vSphere and storage management: Storage administrators structure and publish storage services to vSphere via storage profiles and storage containers.
Automated provisioning, out of the box: The provisioning process automatically matches application-required storage profiles against storage profiles available from the specified storage containers. There is no need to build custom scripts and custom processes to automate storage provisioning to applications
The XIV advantage:
XIV services are very simple to define and publish. The typical number of available storage profiles would be low. It would also be easy to define application storage profiles.
XIV provides consistent high performance, up to very high capacity utilization levels, without any maintenance. As a result, automated provisioning (which inherently implies less human attention) will not create an elevated risk of reduced performance.
Note: A storage vendor VASA provider is required to support VM Volumes, storage profiles, storage containers and automated provisioning. The IBM Storage VASA provider runs as a standalone service that needs to be deployed on a server.
To summarize the VM Volumes value proposition:
Streamline cloud operation by providing storage services at VM and application level, enabling end-to-end provisioning automation, and unifying VMware and storage administration around volumes and VMs.
Increase storage array ROI, improve vSphere scalability and response time, and reduce cloud provisioning lag, by offloading VM-level provisioning, failover, backup, storage migration, storage space recycling, monitoring, and more, to the storage array, using advanced storage operations such as mirroring and snapshots.
Simplify the adoption of VM Volumes using XIV, with smaller and simpler sets of storage profiles. Apply XIV's supreme fast cloning to individual VMs, and keep automation risks at bay with XIV's consistent high performance.
Until you can get your hands on a VM Volumes-capable environment, the VMware and IBM developer groups will be collaborating and working hard to realize this game-changing feature. The above information is definitely expected to trigger your questions or comments, and our development teams are eager to learn from them and respond. Enter your comments below, and I will try to answer them, and help shape the next post on this subject. There's much more to be told.
A reader from New Zealand expressed concern some corporate bloggers were [using the earthquake for marketing]. He lost someone close to him in Christchurch, and is unable to reach a friend living in Japan, so I am sorry for his loss. I plan to be in Australia and New Zealand to teach a Top Gun class May 15-27, so hopefully I will be able to meet him in person when I am down there.
"Earmarking funds is a really good way of hobbling relief organizations and ensuring that they have to leave large piles of money unspent in one place while facing urgent needs in other places. ... Meanwhile, the smaller and less visible emergencies where NGOs can do the most good are left unfunded.
In the specific case of Japan, there's all the more reason not to donate money. Japan is a wealthy country which is responding to the disaster, among other things, by printing hundreds of billions of dollars' worth of new money."
Another reader mentioned that the last surviving American WW-II vet died the same week. WTF? IBM and Japan have been allies for quite a while now, and there is no reason to bring up past wars except to compare the scope and magnitude of the cleanup effort. (Update: Frank Buckles was the last surviving WW-I vet, but also served in WW-II).
Many readers felt that charity begins at home, and there are plenty of worthy causes right here in the USA to donate to instead. Inspired by last year's movie [Waiting for Superman], my girlfriend started a project called [Centers for My Super Stars] for her first grade class on DonorsChoose.org. For those not familiar with this website, DonorsChoose.org uses the cloud to connect school teachers in need of supplies with rich people to donate funds towards these projects. If you want to contribute to her project, [donate here].
"And speaking of class, there just happens to be a baseball team in Sendai, Japan. The Golden Eagles. Their stadium was severely damaged from the earthquake. Wouldn't you think some of them lug nuts who run American baseball would bring the Golden Eagles and their opponents over to the United States when the Japanese season starts -- play some games over here and raise money to help the Japanese? Wouldn't you think they could just once stop that national pastime stuff and help the international pastime?"
As you can see, different readers have different opinions on this. We are all on this world together, and both our economy and our ecology are more interconnected than you might think. Let's build a smarter planet.
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
If we have learned anything from last decade's Y2K crisis, is that we should not wait for the last minute to take action. Now is the time to start thinking about weaning ourselves off Windows XP. IBM has 400,000 employees, so this is not a trivial matter.
Already, IBM has taken some bold steps:
Last July, IBM announced that it was switching from Internet Explorer (IE6) to [Mozilla Firefox as its standard browser]. IBM has been contributing to this open source project for years, including support for open standards, and to make it [more accessible to handicapped employees with visual and motor impairments]. I use Firefox already on Windows, Mac and Linux, so there was no learning curve for me. Before this announcement, if some web-based application did not work on Firefox, our Helpdesk told us to switch back to Internet Explorer. Those days are over. Now, if a web-based application doesn't work on Firefox, we either stop using it, or it gets fixed.
IBM also announced the latest [IBM Lotus Symphony 3] software, which replaces Microsoft Office for Powerpoint, Excel and Word applications. Symphony also works across Mac, Windows and Linux. It is based on the OpenOffice open source project, and handles open-standard document formats (ODF). Support for Microsoft Office 2003 will also run out in the year 2014, so moving off proprietary formats to open standards makes sense.
I am not going to wait for IBM to decide how to proceed next, so I am starting my own migrations. In my case, I need to do it twice, on my IBM-provided laptop as well as my personal PC at home.
Last summer, IBM sent me a new laptop, we get a new one every 3-4 years. It was pre-installed with Windows XP, but powerful enough to run a 64-bit operating system in the future. Here are my series of blog posts on that:
I decided to try out Red Hat Enterprise Linux 6.1 with its KVM-based Red Hat Enterprise Virtualization to run Windows XP as a guest OS. I will try to run as much as I can on native Linux, but will have Windows XP guest as a next option, and if that still doesn't work, reboot the system in native Windows XP mode.
So far, I am pleased that I can do nearly everything my job requires natively in Red Hat Linux, including accessing my Lotus Notes for email and databases, edit and present documents with Lotus Symphony, and so on. I have made RHEL 6.1 my default when I boot up. Setting up Windows XP under KVM was relatively simple, involving an 8-line shell script and 54-line XML file. Here is what I have encountered:
We use a wonderful tool called "iSpring Pro" which merges Powerpoint slides with voice recordings for each page into a Shockwave Flash video. I have not yet found a Linux equivalent for this yet.
To avoid having to duplicate files between systems, I use instead symbolic links. For example, my Lotus Notes local email repository sits on D: drive, but I can access it directly with a link from /home/tpearson/notes/data.
While my native Ubuntu and RHEL Linux can access my C:, D: and E: drives in native NTFS file system format, the irony is that my Windows XP guest OS under KVM cannot. This means moving something from NTFS over to Ext4, just so that I can access it from the Windows XP guest application.
For whatever reason, "Password Safe" did not run on the Windows XP guest. I launch it, but it takes forever to load and never brings up the GUI. Fortunately, there is a Linux version [MyPasswordSafe] that seems to work just fine to keep track of all my passwords.
Personal home PC
My Windows XP system at home gave up the ghost last month, so I bought a new system with Windows 7 Professional, quad-core Intel processor and 6GB of memory. There are [various editions of Windows 7], but I chose Windows 7 Professional to support running Windows XP as a guest image.
Here's is how I have configured my personal computer:
I actually found it more time-consuming to implement the "Virtual PC" feature of Windows 7 to get Windows XP mode working than KVM on Red Hat Linux. I am amazed how many of my Windows XP programs DO NOT RUN AT ALL natively on Windows 7. I now have native 64-bit versions of Lotus Notes and Symphony 3, which will do well enough for me for now.
I went ahead and put Red Hat Linux on my home system as well, but since I have Windows XP running as a guest under Windows 7, no need to duplicate KVM setup there. At least if I have problems with Windows 7, I can reboot in RHEL6 Linux at home and use that for Linux-native applications.
Hopefully, this will position me well in case IBM decides to either go with Windows 7 or Linux as the replacement OS for Windows XP.
A long time ago, perhaps in the early 1990s, I was an architect on the component known today as DFSMShsm on z/OS mainframe operationg system. One of my job responsibilities was to attend the biannual [SHARE conference to listen to the requirements of the attendees on what they would like added or changed to the DFSMS, and ask enough questions so that I can accurately present the reasoning to the rest of the architects and software designers on my team. One person requested that the DFSMShsm RELEASE HARDCOPY should release "all" the hardcopy. This command sends all the activity logs to the designated SYSOUT printer. I asked what he meant by "all", and the entire audience of 120 some attendees nearly fell on the floor laughing. He complained that some clever programmer wrote code to test if the activity log contained only "Starting" and "Ending" message, but no error messages, and skip those from being sent to SYSOUT. I explained that this was done to save paper, good for the environment, and so on. Again, howls of laughter. Most customers reroute the SYSOUT from DFSMS from a physical printer to a logical one that saves the logs as data sets, with date and time stamps, so having any "skipped" leaves gaps in the sequence. The client wanted a complete set of data sets for his records. Fair enough.
When I returned to Tucson, I presented the list of requests, and the immediate reaction when I presented the one above was, "What did he mean by ALL? Doesn't it release ALL of the logs already?" I then had to recap our entire dialogue, and then it all made sense to the rest of the team. At the following SHARE conference six months later, I was presented with my own official "All" tee-shirt that listed, and I am not kidding, some 33 definitions for the word "all", in small font covering the front of the shirt.
I am reminded of this story because of the challenges explaining complicated IT concepts using the English language which is so full of overloaded words that have multiple meanings. Take for example the word "protect". What does it mean when a client asks for a solution or system to "protect my data" or "protect my information". Let's take a look at three different meanings:
The first meaning is to protect the integrity of the data from within, especially from executives or accountants that might want to "fudge the numbers" to make quarterly results look better than they are, or to "change the terms of the contract" after agreements have been signed. Clients need to make sure that the people authorized to read/write data can be trusted to do so, and to store data in Non-Erasable, Non-Rewriteable (NENR) protected storage for added confidence. NENR storage includes Write-Once, Read-Many (WORM) tape and optical media, disk and disk-and-tape blended solutions such as the IBM Grid Medical Archive Solution (GMAS) and IBM Information Archive integrated system.
The second meaning is to protect access from without, especially hackers or other criminals that might want to gather personally-identifiably information (PII) such as social security numbers, health records, or credit card numbers and use these for identity theft. This is why it is so important to encrypt your data. As I mentioned in my post [Eliminating Technology Trade-Offs], IBM supports hardware-based encryption FDE drives in its IBM System Storage DS8000 and DS5000 series. These FDE drives have an AES-128 bit encryption built-in to perform the encryption in real-time. Neither HDS or EMC support these drives (yet). Fellow blogger Hu Yoshida (HDS) indicates that their USP-V has implemented data-at-rest in their array differently, using backend directors instead. I am told EMC relies on the consumption of CPU-cycles on the host servers to perform software-based encryption, either as MIPS consumed on the mainframe, or using their Powerpath multi-pathing driver on distributed systems.
There is also concern about internal employees have the right "need-to-know" of various research projects or upcoming acquisitions. On SANs, this is normally handled with zoning, and on NAS with appropriate group/owner bits and access control lists. That's fine for LUNs and files, but what about databases? IBM's DB2 offers Label-Based Access Control [LBAC] that provides a finer level of granularity, down to the row or column level. For example, if a hospital database contained patient information, the doctors and nurses would not see the columns containing credit card details, the accountants would not see the columnts containing healthcare details, and the individual patients, if they had any access at all, would only be able to access the rows related to their own records, and possibly the records of their children or other family members.
The third meaning is to protect against the unexpected. There are lots of ways to lose data: physical failure, theft or even incorrect application logic. Whatever the way, you can protect against this by having multiple copies of the data. You can either have multiple copies of the data in its entirety, or use RAID or similar encoding scheme to store parts of the data in multiple separate locations. For example, with RAID-5 rank containing 6+P+S configuration, you would have six parts of data and one part parity code scattered across seven drives. If you lost one of the disk drives, the data can be rebuilt from the remaining portions and written to the spare disk set aside for this purpose.
But what if the drive is stolen? Someone can walk up to a disk system, snap out the hot-swappable drive, and walk off with it. Since it contains only part of the data, the thief would not have the entire copy of the data, so no reason to encrypt it, right? Wrong! Even with part of the data, people can get enough information to cause your company or customers harm, lose business, or otherwise get you in hot water. Encryption of the data at rest can help protect against unauthorized access to the data, even in the case when the data is scattered in this manner across multiple drives.
To protect against site-wide loss, such as from a natural disaster, fire, flood, earthquake and so on, you might consider having data replicated to remote locations. For example, IBM's DS8000 offers two-site and three-site mirroring. Two-site options include Metro Mirror (synchronous) and Global Mirror (asynchronous). The three-site is cascaded Metro/Global Mirror with the second site nearby (within 300km) and the third site far away. For example, you can have two copies of your data at site 1, a third copy at nearby site 2, and two more copies at site 3. Five copies of data in three locations. IBM DS8000 can send this data over from one box to another with only a single round trip (sending the data out, and getting an acknowledgment back). By comparison, EMC SRDF/S (synchronous) takes one or two trips depending on blocksize, for example blocks larger than 32KB require two trips, and EMC SRDF/A (asynchronous) always takes two trips. This is important because for many companies, disk is cheap but long-distance bandwidth is quite expensive. Having five copies in three locations could be less expensive than four copies in four locations.
Fellow blogger BarryB (EMC Storage Anarchist) felt I was unfair pointing out that their EMC Atmos GeoProtect feature only protects against "unexpected loss" and does not eliminate the need for encryption or appropriate access control lists to protect against "unauthorized access" or "unethical tampering".
(It appears I stepped too far on to ChuckH's lawn, as his Rottweiler BarryB came out barking, both in the [comments on my own blog post], as well as his latest titled [IBM dumbs down IBM marketing (again)]. Before I get another rash of comments, I want to emphasize this is a metaphor only, and that I am not accusing BarryB of having any canine DNA running through his veins, nor that Chuck Hollis has a lawn.)
As far as I know, the EMC Atmos does not support FDE disks that do this encryption for you, so you might need to find another way to encrypt the data and set up the appropriate access control lists. I agree with BarryB that "erasure codes" have been around for a while and that there is nothing unsafe about using them in this manner. All forms of RAID-5, RAID-6 and even RAID-X on the IBM XIV storage system can be considered a form of such encoding as well. As for the amount of long-distance bandwidth that Atmos GeoProtect would consume to provide this protection against loss, you might question any cost savings from this space-efficient solution. As always, you should consider both space and bandwidth costs in your total cost of ownership calculations.
Of course, if saving money is your main concern, you should consider tape, which can be ten to twenty times cheaper than disk, affording you to keep a dozen or more copies, in as many time zones, at substantially lower cost. These can be encrypted and written to WORM media for even more thorough protection.
Continuing my week in Chicago, for the IBM Storage Symposium 2008, we had sessions that focused on individual products. IBM System Storage SAN Volume Controller (SVC) was a popular topic.
SVC - Everything you wanted to know, but were afraid to ask!
Bill Wiegand, IBM ATS, who has been working with SAN Volume Controller since it was first introduced in 2003. answered some frequently asked questions about IBM System Storage SAN Volume Controller.
Do you have to upgrade all of your HBAs, switches and disk arrays to the recommended firmware levels before upgrading SVC? No. These are recommended levels, but not required. If you do plan to update firmware levels, focus on the host end first, switches next, and disk arrays last.
How do we request special support for stuff not yet listed on the Interop Matrix?
Submit an RPQ/SCORE, same as for any other IBM hardware.
How do we sign up for SVC hints and tips? Go to the IBM
[SVC Support Site] and select the "My Notifications" under the "Stay Informed" box on the right panel.
When we call IBM for SVC support, do we select "Hardware" or "Software"?
While the SVC is a piece of hardware, there are very few mechanical parts involved. Unless there are sparks,
smoke, or front bezel buttons dangling from springs, select "Software". Most of the questions are
related to the software components of SVC.
When we have SVC virtualizing non-IBM disk arrays, who should we call first?
IBM has world-renown service, with some of IT's smartest people working the queues. All of the major storage vendors play nice
as part of the [TSAnet Agreement when a mutual customer is impacted.
When in doubt, call IBM first, and if necessary, IBM will contact other vendors on your behalf to resolve.
What is the difference between livedump and a Full System Dump?
Most problems can be resolved with a livedump. While not complete information, it is generally enough,
and is completely non-disruptive. Other times, the full state of the machine is required, so a Full System Dump
is requested. This involves rebooting one of the two nodes, so virtual disks may temporarily run slower on that
What does "svc_snap -c" do?The "svc_snap" command on the CLI generates a snap file, which includes the cluster error log and trace files from all nodes. The "-c" parameter includes the configuration and virtual-to-physical mapping that can be useful for
disaster recovery and problem determination.
I just sent IBM a check to upgrade my TB-based license on my SVC, how long should I wait for IBM to send me a software license key?
IBM trusts its clients. No software license key will be sent. Once the check clears, you are good to go.
During migration from old disk arrays to new disk arrays, I will temporarily have 79TB more disk under SVC management, do I need to get a temporary TB-based license upgrade during the brief migration period?
Nope. Again, we trust you. However, if you are concerned about this at all, contact IBM and they will print out
a nice "Conformance Letter" in case you need to show your boss.
How should I maintain my Windows-based SVC Master Console or SSPC server?
Treat this like any other Windows-based server in your shop, install Microsoft-recommended Windows updates,
run Anti-virus scans, and so on.
Where can I find useful "How To" information on SVC?
Specify "SAN Volume Controller" in the search field of the
[IBM Redbooks vast library of helpful books.
I just added more managed disks to my managed disk group (MDG), can I get help writing a script to redistribute the extents to improve wide-striping performance?
Yes, IBM has scripting tools available for download on
[AlphaWorks]. For example, svctools will take
the output of the "lsinfo" command, and generate the appropriate SVC CLI to re-migrate the disks around to optimize
performance. Of course, if you prefer, you can use IBM Tivoli Storage Productivity Center instead for a more
Any rules of thumb for sizing SVC deployments?
IBM's Disk Magic tool includes support for SVC deployments. Plan for 250 IOPS/TB for light workloads,
500 IOPS/TB for average workloads, and 750 IOPS/TB for heavy workloads.
Can I migrate virtual disks from one manage disk group (MDG) to another of different extent size?
Yes, the new Vdisk Mirroring capability can be used to do this. Create the mirror for your Vdisk between the
two MDGs, wait for the copy to complete, and then split the mirror.
Can I add or replace SVC nodes non-disruptively? Absolutely, see the Technotes
[SVC Node Replacement page.
Can I really order an SVC EE in Flamingo Pink? Yes. While my blog post that started all
this [Pink It and Shrink It] was initially just some Photoshop humor, the IBM product manager for SVC accepted this color choice as an RPQ option.
The default color remains Raven Black.
Continuing my ongoing discussion on Solid State Disk (SSD), fellow blogger BarryB (EMC) points out in his [latest post]:
Oh – and for the record TonyP, I don't think I ever said EMC was using a newer or different EFDs than IBM. I just asserted that EMC knows more than IBM about these EFDs and how they actually work a storage array under real-world workloads.
(Here "EFD" is refers to "Enterprise Flash Drive", EMC's marketing term for Single Layer Cell (SLC) NAND Flash non-volatile solid-state storage devices. Both IBM and EMC have been selling solid-state storage for quite some time now, but EMC felt that a new term was required to distinguish the SLC NAND Flash devices sold in their disk systems from solid-state devices sold in laptops or blade servers. The rest of the industry, including IBM, continues to use the term SSD to refer to these same SLC NAND Flash devices that EMC is referring to.)
Although STEC asserts that IBM is using the latest ZeusIOPS drives, IBM is only offering the 73GB and 146GB STEC drives (EMC is shipping the latest ZeusIOPS drives in 200GB and 400GB capacities for DMX4 and V-Max, affording customers a lower $/GB, higher density and lower power/footprint per usable GB.)
Here is where I enjoy the subtleties between marketing and engineering. Does the above seem like he is saying EMC is using newer or different drives? What are typical readers expected to infer from the statement above?
That there are four different drives from STEC, in four different capacities. In the HDD world, drives of different capacities are often different, and larger capacities are often newer than those of smaller capacities.
That the 200GB and 400GB are the latest drives, and that 73GB and 146GB drives are not the latest.
That STEC press release is making false or misleading claims.
Uncontested, some readers might infer the above and come to the wrong conclusions. I made an effort to set the record straight. I'll summarize with a simple table:
Usable (conservative format)
Usable (aggressive format)
So, we all agree now that the 256GB drives that are formatted as 146GB or 200GB are in fact the same drives, that IBM and EMC both sell the latest drives offered by STEC, and that the STEC press release was in fact correct in its claims.
I also wanted to emphasize that IBM chose the more conservative format on purpose. BarryB [did the math himself] and proved my key points:
Under some write-intensive workloads, an aggressive format may not last the full five years. (But don't worry, BarryB assures us that EMC monitors these drives and replaces them when they fail within the five years under their warranty program.)
Conservative formats with double the spare capacity happen to have roughly double the life expectancy.
I agree with BarryB that an aggressive format can offer a lower $/GB than the conservative format. Cost-conscious consumers often look for less-expensive alternatives, and are often willing to accept less-reliable or shorter life expectancy as a trade-off. However, "cost-conscious" is not the typical EMC targeted customer, who often pay a premiumfor the EMC label. To compensate, EMC offers RAID-6 and RAID-10 configurations to provide added protection. With a conservative format, RAID-5 provides sufficient protection.
(Just so BarryB won't accuse me of not doing my own math, a 7+P RAID-5 using conservative format 146GB drives would provide 1022GB of capacity, versus 4+4 RAID-10 configuration using aggressive format 200GB drives only 800GB total.)
In an ideal world, you the consumer would know exactly how many IOPS your application will generate over the next five years, exactly how much capacity you will require, be offered all three drives in either format to choose from, and make a smart business decision. Nothing, however, is ever this simple in IT.
My post last week [Solid State Disk on DS8000 Disk Systems] kicked up some dust in the comment section.Fellow blogger BarryB (a member of the elite [Anti-Social Media gang from EMC]) tried to imply that 200GB solid state disk (SSD) drives were different or better than the 146GB drives used in IBM System Storage DS8000 disk systems. I pointed out that they are the actual same physical drive, just formatted differently.
To explain the difference, I will first have to go back to regular spinning Hard Disk Drives (HDD). There are variances in manufacturing, so how do you make sure that a spinning disk has AT LEAST the amount of space you are selling it as? The solution is to include extra. This is the same way that rice, flour, and a variety of other commodities are sold. Legally, if it says you are buying a pound or kilo of flour, then it must be AT LEAST that much to be legal labeling. Including some extra is a safe way to comply with the law. In the case of disk capacity, having some spare capacity and the means to use it follows the same general concept.
(Disk capacity is measured in multiples of 1000, in this case a Gigabyte (GB) = 1,000,000,000 bytes, not to be confused with [Gibibyte (GiB)] = 1,073,741,824 bytes, based on multiples of 1024.)
Let's say a manufacturer plans to sell 146GB HDD. We know that in some cases there might be bad sectors on the disk that won't accept written data on day 1, and there are other marginally-bad sectors that might fail to accept written data a few years later, after wear and tear. A manufacturer might design a 156GB drive with 10GB of spare capacity and format this with a defective-sector table that redirects reads/writes of known bad sectors to good ones. When a bad sector is discovered, it is added to the table, and a new sector is assigned out of the spare capacity.Over time, the amount of space that a drive can store diminishes year after year, and once it drops below its rated capacity, it fails to meet its legal requirements. Based on averages of manufacturing runs and material variances, these could then be sold as 146GB drives, with a life expectancy of 3-5 years.
With Solid State Disk, the technology requires a lot of tricks and techniques to stay above the rated capacity. For example, you can format a 256GB drive as a conservative 146GB usable, with an additional 110GB (75 percent) spare capacity to handle all of the wear-leveling. You could lose up to 22GB of cells per year, and still have the rated capacity for the full five-year life expectancy.
Alternatively, you could take a more aggressive format, say 200GB usable, with only 56GB (28 percent) of spare capacity. If you lost 22GB of cells per year, then sometime during the third year, hopefully under warranty, your vendor could replace the drive with a fresh new one, and it should last the rest of the five year time frame. The failed drive, having 190GB or so usable capacity, could then be re-issued legally as a refurbished 146GB drive to someone else.
The wear and tear on SSD happens mostly during erase-write cycles, so for read-intensive workloads, such as boot disks for operating system images, the aggressive 200GB format might be fine, and might last the full five years.For traditional business applications (70 percent read, 30 percent write) or more write-intensive workloads, IBM feels the more conservative 146GB format is a safer bet.
This should be of no surprise to anyone. When it comes to the safety, security and integrity of our client's data, IBM has always emphasized the conservative approach.[Read More]
Well, it's Tuesday, which means IBM makes its announcements!
This week, IBM announces that it now supports 50GB Solid State Disk (SSD) in its [IBM System Storage EXP3000] disk systems.IBM has already made announcements about SSD enablement in the DS8000 and SAN Volume Controller (SVC), but now the EXP3000 brings SSD technology down to smaller System x server deployments.
Adoption of this new exciting technology is still in the early stages, despite the fact that IBM and other vendors have been touting this technology for a while. (For a quick blast to the past, here was my first post on the subject back from December 20, 2006: [Hybrid, Solid State and the future of RAID])Recently, fellow blogger BarryB admitted that EMC have only sold SSD to [hundreds of their customers], and to be fair, I suspect IBM's sales of SSD in its BladeCenter servers [available since July 2007] have been in similar single-digit percentage territory as well.
The advantage of today's announcement is that you can mix and match SSD drives with SAS and SATA drives in the EXP3000. You won't have to buy the entire drawer of SSD, you can start with just a few, depending on your business needs. On the other extreme, you can have up to two drawers, with 12 SSD drives each, for a total of 24 drives directly attached to System x servers via the ServeRAID MR10M SAS/SATA controller adapter.
Perhaps the recent financial meltdown is making storage vendors nervous.Both IBM and EMC gained market share in 3Q08, but EMC is acting strangelyat IBM's latest series of plays and announcements. Almost contradictory!
Benchmarks bad, rely on your own in-house evaluations instead
Let's start with fellow blogger Barry Burke from EMC, who offers his latest post[Benchmarketing Badly] with commentaryabout Enterprise Strategy Group's [DS5300 Lab Validation Report]. The IBM System Storage DS5300 is one of IBM's latest midrange disk systems recently announced. Take for example this excerpt from BarryB's blog post:
"I was pleasantly surprised to learn that both IBM and ESG agree with me about the relevance and importance of the Storage Performance Council benchmarks.
That is, SPC's are a meaningless tool by which to measure or compare enterprise storage arrays."
Nowhere in the ESG report says this, nor have I found any public statements from either IBM nor ESG that makes this claim. Instead, the ESG report explains that traditional benchmarks from the Storage Performance Council [SPC] focus on a single, specific workload, and ESG has chosen to complement this with a variety of other benchmarks to perform their product validation, including VMware's "VMmark", Oracle's Orion Utility, and Microsoft's JetStress.
Benchmarks provide prospective clients additional information to make purchasedecisions. IBM understands this, ESG understands this, and other well-respected companies like VMware, Oracle and Microsoft understand this. EMC is afraid that benchmarks mightencourage a client to "mistakenly" purchase a faster IBM product than a slower EMC product. Sunshine makes a great disinfectant, but EMC (and vampires) prefer their respective "prospects" remain in the dark.
Perhaps stranger still is BarryB's postscript. Here's an excerpt:
"... a customer here asked me if EMC would be willing to participate in an initiative to get multiple storage vendors to collaborate on truly representative real-world "enterprise-class" benchmarks, and I reassured him that I would personally sponsor active and objective participation in such an effort - IF he could get the others to join in with similar intent."
As I understand it, EMC was once part of the Storage Performance Council a long time ago, then chose to drop out of it. Why re-invent the wheel by creating yet another storage industry benchmark group? EMC is welcome to come back to SPC anytime! In addition to the SCP-1 and SPC-2 workloads, there is work underway for an SPC-3 benchmark. Each SPC workload provides additional insight for product comparisons to help with purchase decisions. If EMC can suggest an SPC-4 benchmark that it feels is more representative of real-world conditions, they are welcome to join the SPC party and make that a reality.
The old adage applies: ["It's better to light a candle than curse the darkness"]. EMC has been cursing the lack of what it considers to be acceptable benchmarks but has yet to offer anything more realistic or representative than SPC.What does EMC suggest you do instead? Get an evaluation box and run your own workloads and see for yourself! EMC has in the past offered evaluation units specifically for this purpose.
In-house evaluations bad, it's a trap!
Certainly, if you have the time and staff to run your own evaluation, with your own applications in your own environment, then I agree with EMC that this can provide better insight for your particular situation than standardized benchmarks.
In fact, that is exactly what IBM is doing for IBM XIV storage units, which are designed for Web 2.0 and Digital Archive workloads that current SPC benchmarks don't focus on. Fellow blogger Chuck Hollis from EMC opines in his post[Get yer free XIV!]. Here's an excerpt:
"Now that I think about it, this could get ugly. Imagine a customer who puts one on the floor to evaluate it, and -- in a moment of desperation or inattention -- puts production data on the device.
Nobody was paying attention, and there you are. Now IBM comes calling for their box back, and you've got a choice as to whether to go ahead and sign the P.O., or migrate all your data off the thing. Maybe they'll sell you an SVC to do this?
Yuck. I bet that happens more than once. And I can't believe that IBM (or the folks at XIV) aren't aware of this potentially happening."
Perhaps Chuck is speaking from experience here, as this may have happened with customers with EMC evaluation boxes, and is afraid this could happen with IBM XIV. I don't see anything unique about IBM XIV in the above concern. Typical evaluations involve copying test data onto the box, test it out with some particular application or workload, and then delete the data no longer required. Repeat as needed. Moving data off an IBM XIV is aseasy as moving data off an EMC DMX, EMC CLARiiON or EMC Celerra, and I am sure IBM wouldgladly demonstrate this on any EMC gear you now have.
Thanks to its clever RAID-X implementation, losing data on an IBM XIV is less likely thanlosing data on any RAID-5 based disk array from any storage vendor. Of course, there will always be skeptics about new technology that will want to try the box out for themselves.
If EMC thought the IBM XIV had nothing unique to offer, that its performance was just "OK",and is not as easy to manage as IBM says it is, then you would think EMC would gladly encourage such evaluations and comparisons, right?
No, I think EMC is afraid that companies will discover what they already know, that IBM has quality products that would stand a fair chance of side-by-side comparisons with their own offerings.We have enough fear, uncertainty and doubt from our current meltdown of the global financial markets, don't let EMC add any more.
Have a safe and fun Halloween! If you need to add some light to your otherwise dark surroundings, consider some of these ideas for [Jack-O-Lanterns]!
The focus on square footage resulted in higher density. This reminds me of the classicIBM commercial ["The Heist"] where Gil panics that the roomful of servers are missing, and Ned explains that it was all consolidated ontoa single IBM server.
I suspect few people picked up on the fact that the acronym for["new enterprise datacenter"] spells "Ned", ourdonut-eating hero in these series of videos.
Costs in the data center are proportional to power usage rather than space.
Power efficiency is more of a behavior problem than it is a technology problem.
This is definitely a step in the right direction. Both servers and storage systems consume a large portionof the energy on the data center floor. IBM Tivoli Usage and Accounting Manager can includeenergy consumption as part of the chargeback calculations.
( I cannot take credit for coining the new term "bleg". I saw this term firstused over on the [FreakonomicsBlog]. If you have not yet read the book "Freakonomics", I highly recommend it! The authors' blog is excellent as well.)
For this comparison, it is important to figure out how much workload a mainframe can support, how much an x86 cansupport, and then divide one from the other. Sounds simple enough, right? And what workload should you choose?IBM chose a business-oriented "data-intensive" workload using Oracle database. (If you wanted instead a scientific"compute-intensive" workload, consider an [IBM supercomputer] instead, the most recent of which clocked in over 1 quadrillion floating point operations per second, or PetaFLOP.) IBM compares the following two systems:
Sun Fire X2100 M2, model 1220 server (2-way)
IBM did not pick a wimpy machine to compare against. The model 1220 is the fastest in the series, with a 2.8Ghz x86-64 dual-core AMD Opteron processor, capable of running various levels of Solaris, Linux or Windows.In our case, we will use Oracle workloads running on Red Hat Enterprise Linux.All of the technical specifications are available at the[Sun Microsystems Sun Fire X1200] Web site.I am sure that there are comparable models from HP, Dell or even IBM that could have been used for this comparison.
IBM z10 Enterprise Class mainframe model E64 (64-way)
This machine can run a variety of operating systems also, including Red Hat Enterprise Linux (RHEL). The E64 has four "multiple processor modules" called"processor books" for a total of 77 processing units: 64 central processors, 11 system assist processors (SAP) and 2 spares. That's right, spare processors, in case any others gobad, IBM has got your back. You can designate a central processor in a variety of flavors. For running z/VM and Linux operating systems, the central processors can be put into "Integrated Facility for Linux" (IFL) mode.On IT Jungle, Timothy Patrick Morgan explains the z10 EC in his article[IBM Launches 64-Way z10 Enterprise Class Mainframe Behemoth]. For more information on the z10 EC, see the 110-page [Technical Introduction], orread the specifications on the[IBM z10 EC] Web site.
In a shop full of x86 servers, there are production servers, test and development servers, quality assuranceservers, standby idle servers for high availability, and so on. On average, these are only 10 percent utilized.For example, consider the following mix of servers:
125 Production machines running 70 percent busy
125 Backup machines running idle ready for active failover in case a production machine fails
1250 machines for test, development and quality assurance, running at 5 percent average utilization
While [some might question, dispute or challenge thisten percent] estimate, it matches the logic used to justify VMware, XEN, Virtual Iron or other virtualization technologies. Running 10 to 20 "virtual servers" on a single physical x86 machine assumes a similar 5-10 percent utilization rate.
Note: The following paragraphs have been revised per comments received.
Now the math. Jon, I want to make it clear I was not involved in writing the press release nor assisted with thesemath calculations. Please, don't shoot the messenger! Remember this cartoon where two scientists in white lab coats are writing mathcalculations on a chalkboard, and in the middle there is "and then a miracle happens..." to continue the rest ofthe calculations?
In this case, the miracle is the number that compares one server hardware platform to another. I am not going to bore people with details like the number of concurrent processor threads or the differencesbetween L1 and L3 cache. IBM used sophisticated tools and third party involvement that I am not allowed to talk about, and I have discussed this post with lawyers representing four (now five) different organizations already,so for the purposes of illustration and explanation only, I have reverse-engineered a new z10-to-Opteron conversion factor as 6.866 z10 EC MIPS per GHz of dual-core AMD Opteron for I/O-intensive workloads running only 10 percent average CPU utilization. Business applications that perform a lot of I/O don't use their CPU as much as other workloads.For compute-intensive or memory-intensive workloads, the conversion factor may be quite different, like 200 MIPS per GHz, as Jeff Savit from Sun Microsystems points out in the comments below.
Keep in mind that each processor is different, and we now have Intel, AMD, SPARC, PA-RISC and POWER (and others); 32-bit versus 64-bit; dual-core and quad-core; and different co-processor chip sets to worry about. AMD Opteron processors come in different speeds, but we are comparing against the 2.8GHz, so 1500 times 6.866 times 2.8 is 28,337. Since these would be running as Linux guestsunder z/VM, we add an additional 7 percent overhead or 2,019 MIPS. We then subtract 15 percent for "smoothing", whichis what happens when you consolidate workloads that have different peaks and valleys in workload, or 4,326 MIPS.The end is that we need a machine to do 26,530 MIPS. Thanks to advances in "Hypervisor" technological synergy between the z/VM operating system and the underlying z10 EC hardware, the mainframe can easily run 90 percent utilized when aggregating multiple workloads, so a 29,477 MIPS machine running at 90 percent utilization can handle these 26,530 MIPS.
N-way machines, from a little 2-way Sun Fire X2100 to the might 64-way z10 EC mainframe, are called "Symmetric Multiprocessors". All of the processors or cores are in play, but sometimes they have to taketurns, wait for exclusive access on a shared resource, such as cache or the bus. When your car is stopped at a red light, you are waiting for your turn to use the shared "intersection". As a result, you don't get linear improvement, but rather you get diminishing returns. This is known generically as the "SMP effect", and in IBM documentsthis as [Large System Performance Reference].While a 1-way z10 EC can handle 920 MIPS, the 64-way can only handle30,657 MIPS. The 29,477 MIPS needed for the Sun x2100 workload can be handled by a 61-way, giving you three extraprocessors to handle unexpected peaks in workload.
But are 1500 Linux guest images architecturally possible? A long time ago, David Boyes of[Sine Nomine Associates] ran 41,400 Linux guest images on a single mainframe using his [Test Plan Charlie], and IBM internallywas able to get 98,000 images, and in both cases these were on machines less powerful than the z10 EC. Neitherof these were tests ran I/O intensive workloads, but extreme limits are always worth testing. The 1500-to-1 reduction in IBM's press release is edge-of-the-envelope as well, so in production environments, several hundred guest images are probably more realistic, and still offer significant TCO savings.
The z10 EC can handle up to 60 LPARs, and each LPAR can run z/VM which acts much like VMware in allowing multipleLinux guests per z/VM instance. For 1500 Linux guests, you could have 25 guests each on 60 z/VM LPARs, or 250 guests on each of six z/VM LPARs, or 750 guests on two LPARs. with z/VM 5.3, each LPAR can support up to 256GB of memory and 32 processors, so you need at least two LPAR to use all 64 engines. Also, there are good reasons to have different guests under different z/VM LPARs, such as separating development/test from production workloads. If you had to re-IPLa specific z/VM LPAR, it could be done without impacting the workloads on other LPARs.
To access storage, IBM offers N-port ID Virtualization (NPIV). Without NPIV, two Linux guest images could not accessthe same LUN through the same FCP port because this would confuse the Host Bus Adapter (HBA), which IBM calls "FICON Express" cards. For example, Linux guest 1 asks to read LUN 587 block 32 and this is sent out a specific port, to a switch, to a disk system. Meanwhile, Linux guest 2 asks to read LUN 587 block 49. The data comes back to the z10 EC with the data, gives it to the correct z/VM LPAR, but then what? How does z/VM know which of the many Linux guests to give the data to? Both touched the same LUN, so it is unclear which made the request. To solve this, NPIV assigns a virtual "World Wide Port Name" (WWPN), up to 256 of them per physical port, so you can have up to 256 Linux guests sharing the same physical HBA port to access the same LUN.If you had 250 guests on each of six z/VM LPARs, and each LPAR had its own set of HBA ports, then all 1500 guestscould access the same LUN.
Yes, the z10 EC machines support Sysplex. The concept is confusing, but "Sysplex" in IBM terminology just means that you can have LPARs either on the same machine or on separate mainframes, all sharing the same time source, whether this be a "Sysplex Timer" or by using the "Server Time Protocol" (STP). The z10 EC can have STP over 6 Gbps Infiniband over distance. If you wantedto have all 1500 Linux guests time stamp data identically, all six z/VM LPARs need access to the shared time source. This can help in a re-do or roll-back situation for Oracle databases to complete or back-out "Units of Work" transactions. This time stamp is also used to form consistency groups in "z/OS Global Mirror", formerly called "XRC" for Extended Remote Distance Copy. Currently, the "timestamp" on I/O applies only to z/OS and Linux and not other operating systems. (The time stamp is done through the CDK driver on Linux, and contributed back to theopen source community so that it is available from both Novell SUSE and Red Hat distributions.)To have XRC have consistency between z/OS and Linux, the Linux guests would need to access native CKD volumes,rather than VM Minidisks or FCP-oriented LUNs.
Note: this is different than "Parallel Sysplex" which refers to having up to 32 z/OS images sharing a common "Coupling Facility" which acts as shared memory for applications. z/VM and Linux do not participate in"Parallel Sysplex".
As for the price, mainframes list for as little as "six figures" to as much as several million dollars, but I have no idea how much this particular model would cost. And, of course, this is just the hardware cost. I could not find the math for the $667 per server replacement you mentioned, so don't have details on that.You would need to purchase z/VM licenses, and possibly support contracts for Linux on System z to be fully comparable to all of the software license and support costs of the VMware, Solaris, Linux and/or Windows licenses you run on the x86 machines.
This is where a lot of the savings come from, as a lot of software is licensed "per processor" or "per core", and so software on 64 mainframe processors can be substantially less expensive than 1500 processors or 3000 cores.IBM does "eat its own cooking" in this case. IBM is consolidating 3900 one-application-each rack-mounted serversonto 30 mainframes, for a ratio of 130-to-1 and getting amazingly reduced TCO. The savings are in the followingareas:
Hardware infrastructure. It's not just servers, but racks, PDUs, etc. It turns out to be less expensive to incrementally add more CPU and storage to an existing mainframe than to add or replace older rack-em-and-stack-emwith newer models of the same.
Cables. Virtual servers can talk to each other in the same machine virtually, such as HiperSockets, eliminatingmany cables. NPIV allows many guests to share expensive cables to external devices.
Networking ports. Both LAN and SAN networking gear can be greatly reduced because fewer ports are needed.
Administration. We have Universities that can offer a guest image for every student without having a majorimpact to the sys-admins, as the students can do much of their administration remotely, without having physicalaccess to the machinery. Companies uses mainframe to host hundreds of virtual guests find reductions too!
Connectivity. Consolidating distributed servers in many locations to a mainframe in one location allows youto reduce connections to the outside world. Instead of sixteen OC3 lines for sixteen different data centers, you could have one big OC48 line instead to a single data center.
Software licenses. Licenses based on servers, cores or CPUs are reduced when you consolidate to the mainframe.
Floorspace. Generally, floorspace is not in short supply in the USA, but in other areas it can be an issue.
Power and Cooling. IBM has experienced significant reduction in power consumption and cooling requirementsin its own consolidation efforts.
All of the components of DFSMS (including DFP, DFHSM, DFDSS and DFRMM) were merged into a single product "DFSMS for z/OS" and is now an included element in the base z/OS operating system. As a result of these, customers typically have 80 to 90 percent utilization on their mainframe disk. For the 1500 Linux guests, however, most of the DFSMS features of z/OS do not apply. These functions were not "ported over" to z/VM nor Linux on any platform.
Instead, the DFSMS concepts have been re-implemented into a new product called "Scale-Out File Services" (SOFS) which would provide NAS interfaces to a blendeddisk-and-tape environment. The SOFS disk can be kept at 90 percent utilization because policies can place data, movedata and even expire files, just like DFSMS does for z/OS data sets. SOFS supports standard NAS protocols such as CIFS,NFS, FTP and HTTP, and these could be access from the 1500 Linux guests over an Ethernet Network Interface Card (NIC), which IBM calls "OSA Express" cards.
Lastly, IBM z10 EC is not emulating x86 or x86-64 interfaces for any of these workloads. No doubt IBM and AMD could collaborate together to come up with an AMD Opteron emulator for the S/390 chipset, and load Windows 2003 right on top of it, but that would just result in all kinds of emulation overhead.Instead, Linux on System z guests can run comparable workloads. There are many Linux applications that are functionally equivalent or the same as their Windows counterparts. If you run Oracle on Windows, you could runOracle on Linux. If you run MS Exchange on Windows, you could run Bynari on Linux and let all of your Outlook Expressusers not even know their Exchange server had been moved! Linux guest images can be application servers, web servers, database servers, network infrastructure servers, file servers, firewall, DNS, and so on. For nearly any business workload you can assign to an x86 server in a datacenter, there is likely an option for Linux on System z.
Hope this answers all of your questions, Jon. These were estimates based on basic assumptions. This is not to imply that IBM z10 EC and VMware are the only technologies that help in this area, you can certainly find virtualization on other systems and through other software.I have asked IBM to make public the "TCO framework" that sheds more light on this.As they say, "Your mileage may vary."
For more on this series, check out the following posts:
If in your travels, Jon, you run into someone interested to see how IBM could help consolidate rack-mounted servers over to a z10 EC mainframe, have them ask IBM for a "Scorpion study". That is the name of the assessment that evaluates a specific clientsituation, and can then recommend a more accurate estimate configuration.
In my post yesterday [Spreading out the Re-Replication process], fellow blogger BarryB [aka The Storage Anarchist]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
JBOD: 4 drives
JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
RAID-10: 8 drives
RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
RAID-5: 5 drives
This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
RAID-6: 6 drives
This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
On his The Storage Architect blog, Chris Evans wrote [Twofor the Price of One]. He asks: why use RAID-1 compared to say a 14+2 RAID-6 configuration which would be much cheaper in terms of the disk cost? Perhpaps without realizing it, answers itwith his post today [XIV part II]:
So, as a drive fails, all drives could be copying to all drives in an attempt to ensure the recreated lost mirrors are well distributed across the subsystem. If this is true, all drives would become busy for read/writes for the rebuild time, rather than rebuild overhead being isolated to just one RAID group.
Let me try to explain. (Note: This is an oversimplification of the actual algorithm in an effortto make it more accessible to most readers, based on written materials I have been provided as partof the acquisition.)
In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one driveto write, and today's systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.
In the IBM XIV Nextra architecture, let's say we have 100 drives. We lose drive 13, and we need to re-replicate any at-risk 1MB objects.An object is at-risk if it is the last and only remaining copy on the system. A 750GB that is 90 percent full wouldhave 700,000 or so at-risk object re-replications to manage. These can be sorted by drive. Drive 1 might have about 7000 objects that need re-replication, drive 2might have slightly more, slightly less, and so on, up to drive 100. The re-replication of objects on these other 99 drives goes through three waves.
Select 49 drives as "source volumes", and pair each randomly with a "destination volume". For example, drive 1 mapped todrive 87, drive 2 to drive 59, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
50 volumes left.Select another 49 drives as "source volumes", and pair each with a "destination volume". For example, drive 87 mapped todrive 15, drive 59 to drive 42, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
Only one drive left. We select the last volume as the source volume, pair it off with a random destination volume,and complete the process.
Each wave can take as little as 3-5 minutes. The actual algorithm is more complicated than this, as tasks complete early the source and volumes drives are available for re-assignment to another task, but you get the idea. XIV hasdemonstrated the entire process, identifying all at-risk objects, sorting them by drive location, randomly selectingdrive pairs, and then performing most of these tasks in parallel, can be done in 15-20 minutes. Over 40 customershave been using this architecture over the past 2 years, and by now all have probably experienced at least adrive failure to validate this methodology.
In the unlikely event that a second drive fails during this short time, only one of the 99 task fails. The other 98 tasks continue to helpprotect the data. By comparison, in a RAID-5 rebuild, no data is protected until all the blocks are copied.
As for requiring spare capacity on each drive to handle this case, the best disks in production environments aretypically only 85-90 percent full, leaving plenty of spare capacity to handle re-replication process. On average,Linux, UNIX and Windows systems tend to only fill disks 30 to 50 percent full, so the fear there is not enough sparecapacity should not be an issue.
The difference in cost between RAID-1 and RAID-5 becomes minimal as hardware gets cheaper and cheaper. For every $1 dollar you spend on storage hardware, you spend $5-$8 dollars managing the environment. As hardware gets cheaper still, it might even be worth making three copies of every 1MB object, the parallel processto perform re-replications would be the same. This could be done using policy-based management, some data gets triple-copied, and other data gets only double-copied, based on whether the user selected "premium" or "basic" service.
The beauty of this approach is that it works with 100 drives, 1000 drives, or even a million drives. Parallel processingis how supercomputers are able to perform feats of amazing mathematical computations so quickly, and how Web 2.0services like Google and Yahoo can perform web searches so quickly. Spreading the re-replication process acrossmany drives in parallel, rather than performing them serially onto a single drive, is just one of the many uniquefeatures of this new architecture.
Well, we had another successful event in Second Life today.
Unlike our April 26 launch of our System Storage products for IBM Business Partners only, this time we decided this time to make it as a "Meet the Storage Experts" Q&A Panel format, and open up registration to everyone. Thesubject matter experts sat at the front of the room on four stools. We had six rows of chairs arrangedsemi-circularly.
Shown above, from left to right, are the avatars of our four experts:
IBM System Storage N series, focusing on recent N3000 disk system announcements
Harold Pike (holding the microphone while speaking)
IBM System Storage DS3000 and DS4000 series, focusing on recent DS3000 disk system announcements
IBM System Storage TS series, focusing on recent TS2230, TS3400 and TS7700 tape system announcements
IBM storage networking, focusing on recent IBM SAN256B director blade announcements
While Eric was a veteran Second Lifer, having presented at our April event, the other three were trainedon how to raise their hand, speak into the microphone, sit on the stool, and so on. I want to thank allof our experts for putting in this effort!
The event was produced by Katrina H Smith. She did a great job, and made sure we were on top ofall the issues and tasks required to get the job done. Running a Second Life event is every bit ashard as running a real face-to-face event. We had several meetings to discuss venue details, placementof chairs, placement of product demos, audio/video recording, wall decorations, tee-shirt and coffee mug design, logistics, and so on.
I acted as moderator/emcee for the event. That is my back in the picture above. The process wassimple, modeled after the "Birds of a Feather" sessions at events like SHARE and the IBMStorage and Storage Networking Symposium. We threw out a list of topics the experts would cover,and people in the audience would "raise their left hand". I, as the moderator, would then walkover to each person, and hold out the microphone for them to ask the question. I would then repeat the question and ask the appropriate expert to provide an answer. We defined gestures onhow to "raise hand" and "put hand down" that we gave to each registered participant.
We had four dedicated "camera-avatars" in world to capture both video and screenshots.Our video editors are now working to edit "highlight videos" that we can use at future events, for training materials, and for our internal "BlueTube" online video system.
The room was filled with examples of each of our products, made into 3D objects that were dimensionallycorrect, and "textured" with photographs of the actual products. If you click on an object, you get a "notecard" that provided more information. Special thanks to Scott Bissmeyer for making all of theseobjects for us.
We made posters of each expert and placed them in all four corners of the room. On the bottom of each coffee mug was a picture of each of the experts, and if you walked under each of the posters, you were"dispensed" a coffee mug matching the expert shown in the poster.Participants could "Collect all Four!" When you bring the coffee mug up to takea sip, the picture on the bottom of the mug is exposed for all to see.And as a final give-away to the audience, we made a variety of event tee-shirts and polo-shirts.
At the end of the session, we asked everyone to click on the "Survey" kiosk near the exit door. We askedsix simple questions using SurveyMonkey.com that took only a fewminutes to process. We found asking questions immediately at the end of the event was the best way tocapture this feedback.
From a "Green" perspective, we had people registered from the following countries: US, India, Mexico,Australia, United Kingdom, Brazil, Germany, Argentina, Chile, China, Canada, and Venezuela. Second Lifeallows all these people who probably could not travel, or could not afford the time and expense to travel,to participate in a simulated face-to-face meeting without energy consumption of traditional travel methods.
More importantly, we got several leads for business. People often ask "Yes, but is there any businessassociated with this?" This time, there was, based on the answers to the questions, several avatars asked for a real sales call to follow-up on the products and offerings they were discussed.
With such a great success, we have already scheduled our next Second Life event, November 8. Mark your calendars! I'll postmore details on the registration process of the November event when available.
The proof-of-concept that IBM Haifa research center developed back in 1998 became what we now call the iSCSI protocol.The book iSCSI: The Universal Storage Connection introduces the history as follows:
In the fall of 1999 IBM and Cisco met to discuss the possibility of combining their SCSI-over-TCP/IP efforts. After Cisco saw IBM's demonstration of SCSI over TCP/IP, the two companies agreed to develop a proposal that would be taken to the IETF for standardization.
There are three ways to introduce iSCSI into your data center:
Through a gateway, like the IBM System Storage N series gateway, that allows iSCSI-based servers connect to FC-based storage devices
Through a SAN switch or director, a FC-based server can access iSCSI-based storage, an iSCSI-based server accessing FC-based storage, or even iSCSI-based servers attaching to iSCSI-based storage.
Directly through the storage controller.
IBM has been delivering the first method with its successful IBM System Storage N series gateway products, buttoday we have announced additional support for the second and third methods.Here's a quick recap.
New SAN director blades
Supporting the second method, IBM TotalStorage SAN256B Director is enhanced to deliver iSCSI functionality with a new M48 iSCSI Blade, which includes 16 ports (8 Fibre Channel ports; and 8 Ethernet ports for iSCSI connectivity). We also announced a new Fibre Channel M48 Blade which provides 10 Gbps Fibre Channel Inter Switch Link (ISL) connectivity between SAN256B Directors.
With support for Boot-over-iSCSI, diskless rack-optimized and blade servers can boot Windows or Linux over Ethernet,eliminating the management hassles with internal disk.
All of this is part of IBM's overall push into the Small and Medium size Business marketplace, making it easier to shop for and buy from IBM and its many IBM Business Partners, easier to deploy and install storage, and easier tomanage the storage once you have it.
Some people find it surprising that it is often more cost-effective, and power-efficient, to run workloads on mainframe logical partitions (LPARs) than a stack of x86 servers running VMware.
Perhaps they won't be surprised any more. Here is an article in eWeek that explains how IBM isreducing energy costs 80% by consolidating 3,900 rack-optimized servers to 33 IBM System z mainframe servers, running Linux, in its own data centers. Since 1997, IBM has consolidated its 155 strategic worldwide data center locations down to just seven.
I am very pleased that IBM has invested heavily into Linux, with support across servers, storage, software andservices. Linux is allowing IBM to deliver clever, innovative solutions that may not be possible with other operating systems. If you are in storage, you should consider becoming more knowledgeable in Linux.
The older systems won't just end up in a landfill somewhere. Instead, the details are spelled out inthe IBM Press Release:
As part of the effort to protect the environment, IBM Global Asset Recovery Services, the refurbishment and recycling unit of IBM, will process and properly dispose of the 3,900 reclaimed systems. Newer units will be refurbished and resold through IBM's sales force and partner network, while older systems will be harvested for parts or sold for scrap. Prior to disposition, the machines will be scrubbed of all sensitive data. Any unusable e-waste will be properly disposed following environmentally compliant processes perfected over 20 years of leading environmental skill and experience in the area of IT asset disposition.
Whereas other vendors might think that some operational improvements will be enough, such as switching to higher-capacity SATA drives, or virtualizing x86 servers, IBM recognizes that sometimes more fundamental changes are required to effect real changes and real results.
Some job titles can be vague. Have you ever given your title to a person at a cocktail party, only to have to explain exactly what you do? With a title like "IBM Master Inventor and Senior Managing Consultant", this happens to me all the time. To help explain what we do at the Tucson Executive Briefing Center (EBC), I use the following analogy.
People who want to see or interact with animals have several options. One option is to go visit the animals in their natural habitat. A more convenient option, however, is to visit the animals in a zoo. Zoos bring together a wide variety of animals, making it convenient to visit all of them at one time.
I did not fully appreciate the advantage of zoos until I took a safari in Kenya, Africa a few years ago. The word safari means "long journey" in Swahili. For two weeks, we drove around in a Land Rover on bumpy roads across the country. The best time to see the animals was early in the morning and late in the afternoon. We would drive around for hours looking for a type animal we had not seen already. Most came to see the so-called "Big Five": Buffalo, Elephant, Leopard, Lion and Rhinoceros. After two weeks and hundreds of miles, we had seen the "Big Nine" which extends the Big Five to include the Cheetah, Zebra, Giraffe and Hippo, as well as seeing a variety of other, lesser known animals.
When it comes to zoos, there are two kinds.
Self-guided -- offering the basic zoo experience where you are handed a map to visit the animals on your own.
Docent-guided -- offering a richer zoo experience where the docent provides added value, leading visitors around the zoo, answering questions, providing education, and comparing the differences between the animals.
Over the past 15 years, IBM has been consolidating storage development in Tucson, Arizona moving storage-related projects from San Jose, CA, from Rochester, MN, and from Raleigh, NC. Tucson has the largest collection of IBM storage hardware and software development in North America. I am one of the three local "docents", guiding the clients that come to Tucson to visit the developers.
Here are some of the types of developers that our clients ask to interact with:
A was hired into IBM back in 1986 as a Research Scientist. When clients want to hear about IBM's future direction over the next 10-15 years, we bring in someone from IBM Research.
While disk systems may seem no more complicated as arranging books on a shelf, clients often want to talk to hardware engineers related to IBM's tape libraries, especially the IBM System Storage TS3500 library and the High-Density frame that can store multiple cartridges per slot in a spring-loaded manner.
I have a Bachelor's degree in Computer Engineering and Master's degree in Electrical Engineering, so I am able to speak both sides of the hardware/software divide. Software engineers here in Tucson develop the microcode that runs on disk and tape hardware, the various GUI, CLI and SMI-S API interfaces, as well as Tivoli Storage software, especially Tivoli Storage Manager (TSM) and Tivoli Storage Productivity Center.
IBM Tucson has a huge test lab, and our testers are very familiar with all of the subtle nuances of interoperability between servers, HBAs, switches and storage devices. We have system and function testers for the individual products, ISV testers to validate software compatability, performance testers, and environment testers to verify the storage devices can handle extremes in temperature, humidity, vibration and noise.
IBM has architects for each product line to help decide which features and functions are developed for each product release. While many software engineers have expertise narrowly focused on an individual component, the system architects need to have a broad awareness of the entire environment. Earlier in my career, I was the chief architect for DFSMS, the storage management element of the z/OS mainframe operating sytsem, and chief architect for what we now call Tivoli Storage Productivity Center.
Product and Portfolio Managers
Product and Portfolio managers are helpful to explain to clients why IBM invested more in some products than others. I had served as the Portfolio Manager for IBM tape systems. When clients want to talk about the business side of our products, such as pricing, licensing and leasing issues, we bring the product and portfolio managers in.
For some clients, high level executives want to speak to their counterparts at IBM, vice president to vice president, executive to executive. Our local IBM executives often help kick off the briefing in the morning, or provide the executive summary and discuss next steps at the end of the day. Golfing, dinners and drinks, of course, are always a popular scheduing option.
On behalf of the rest of the Tucson EBC, I would like to thank all the developers who have helped us last year with client briefings. There are too many to mention, and most are too humble to let me put their names in this blog. Team, your assistance is very appreciated!
Many IBMers consider Tucson to be the headquarters for storage, and I have heard IBM executives refer to Tucson as the center of the universe for storage products. However, IBM is a global company. Just as zoos do not pretend to be complete collections of animals, IBM storage development is not entirely contained in Tucson. IBM Research for storage is also done in Almaden CA, Yorktown Heights NY, and Haifa, Israel. Hardware development is also done in Japan, Europe and Israel. Tivoli Storage has locations in Beaverton, Oregon, and Austin, Texas, to name a few. IBM is a big company, so if I left your favorite location off the list, let me know in the comments below.
Some clients, sales reps and business partners have complained that Tucson is not the most convenient location to get to. I get that. One rep asked why we don't have briefing centers somewhere more accessible, such as Chicago or Atlanta, both cities offer a major airline hub. As much as I personally enjoy cities like Chicago or Atlanta, people don't visit zoos just to see the docents, they come to see the animals. Having docents located in Chicago or Atlanta, standing sadly in front of empty cages with no animals to interact with, makes no sense at all.
With over 350 days of sunshine per year, Tucson is actually a well-kept secret. Clients who have never been to Tucson discover the wonders of the Sonoran desert. Coyotes chase roadrunners across our parking lot. Several clients who have come to visit us have ended up buying retirement homes here. If you haven't been to Tucson, or it has been a while since your last trip, I encourage you to [schedule a briefing]. The weather right now is ideal!
Did IBM XIV force EMC's hand to announce VMAXe? Let's take a stroll down memory lane.
In 2008, IBM XIV showed the world that it could ship a Tier-1, high-end, enterprise-class system using commodity parts. Technically, prior to its acquisition by IBM, the XIV team had boxes out in production since 2005. EMC incorrectly argued this announcement meant the death of the IBM DS8000. Just because EMC was unable to figure out how to have more than one high-end disk product, doesn't mean IBM or other storage vendors were equally challenged. Both IBM XIV and DS8000 are Tier-1, high-end, enterprise-class storage systems, as are the IBM N series N7900 and the IBM Scale-Out Network Attached Storage (SONAS).
In April 2009, EMC followed IBM's lead with their own V-Max system, based on Symmetrix Engenuity code, but on commodity x86 processors. Nobody at EMC suggested that the V-Max meant the death of their other Symmetrix box, the DMX-4, which means that EMC proved to themselves that a storage vendor could offer multiple high-end disk systems. Hitachi Data Systems (HDS) would later offer the VSP, which also includes some commodity hardware as well.
In July 2009, analysts at International Technology Group published their TCO findings that IBM XIV was 63 percent less expensive than EMC V-Max, in a whitepaper titled [COST/BENEFIT CASE
FOR IBM XIV STORAGE SYSTEM Comparing Costs for IBM XIV and EMC V-Max Systems]. Not surprisingly, EMC cried foul, feeling that EMC V-Max had not yet been successful in the field, it was too soon to compare newly minted EMC gear with a mature product like XIV that had been in production accounts for several years. Big companies like to wait for "Generation 1" of any new product to mature a bit before they purchase.
To compete against IBM XIV's very low TCO, EMC was forced to either deeply discount their Symmetrix, or counter-offer with lower-cost CLARiiON, their midrange disk offering. An ex-EMCer that now works for IBM on the XIV sales team put it in EMC terms -- "the IBM XIV provides a Symmetrix-like product at CLARiiON-like prices."
(Note: Somewhere in 2010, EMC dropped the hyphen, changing the name from V-Max to VMAX. I didn't see this formally announced anywhere, but it seems that the new spelling is the officially correct usage. A common marketing rule is that you should only rename failed products, so perhaps dropping the hyphen was EMC's way of preventing people from searching older reviews of the V-Max product.)
This month, IBM introduced the IBM XIV Gen3 model 114. The analysts at ITG updated their analysis, as there are now more customers that have either or both products, to provide a more thorough comparison. Their latest whitepaper, titled [Cost/Benefit Case for IBM XIV Systems: Comparing Cost
Structures for IBM XIV and EMC VMAX Systems], shows that IBM maintains its substantial cost savings advantage, representing 69 percent less Total Cost of Ownership (TCO) than EMC, on average, over the course of three years.
In response, EMC announced its new VMAXe, following the naming convention EMC established for VNX and VNXe. Customers cannot upgrade VNXe to VNX, nor VMAXe to VMAX, so at least EMC was consistent in that regard. Like the IBM XIV and XIV Gen3, the new EMC VMAXe eliminated "unnecessary distractions" like CKD volumes and FICON attachment needed for the IBM z/OS operating system on IBM System z mainframes. Fellow blogger Barry Burke from EMC explains everything about the VMAXe in his blog post [a big thing in a small package].
So, you have to wonder, did IBM XIV force EMC's hand into offering this new VMAXe storage unit? Surely, EMC sales reps will continue to lead with the more profitable DMX-4 or VMAX, and then only offer the VMAXe when the prospective customer mentions that the IBM XIV Gen3 is 69 percent less expensive. I haven't seen any list or street prices for the VMAXe yet, but I suspect it is less expensive than VMAX, on a dollar-per-GB basis, so that EMC will not have to discount it as much to compete against IBM.
In his last post in this series, he mentions that the amazingly successful IBM SAN Volume Controller was part of a set of projects:
"IBM was looking for "new horizon" projects to fund at the time, and three such projects were proposed and created the "Storage Software Group". Those three projects became know externally as TPC, (TotalStorage Productivity Center), SanFS (SAN File System - oh how this was just 5 years too early) and SVC (SAN Volume Controller). The fact that two out of the three of them still exist today is actually pretty good. All of these products came out of research, and its a sad state of affairs when research teams are measured against the percentage of the projects they work on, versus those that turn into revenue generating streams."
But this raises the question: Was SAN File System just five years too early?
IBM classifies products into three "horizons"; Horizon-1 for well-established mature products, Horizon-2 was for recently launched products, and Horizon-3 was for emerging business opportunities (EBO). Since I had some involvement with these other projects, I thought I would help fill out some of this history from my perspective.
Back in 2000, IBM executive [Linda Sanford] was in charge of IBM storage business and presented that IBM Research was working on the concept of "Storage Tank" which would hold Petabytes of data accessible to mainframes and distributed servers.
In 2001, I was the lead architect of DFSMS for the IBM z/OS operating system for mainframes, and was asked to be lead architect for the new "Horizon 3" project to be called IBM TotalStorage Productivity Center (TPC), which has since been renamed to IBM Tivoli Storage Productivity Center.
In 2002, I was asked to lead a team to port the "SANfs client" for SAN File System from Linux-x86 over to Linux on System z. How easy or difficult to port any code depends on how well it was written with the intent to be ported, and porting the "proof-of-concept" level code proved a bit too challenging for my team of relative new-hires. Once code written by research scientists is sufficiently complete to demonstrate proof of concept, it should be entirely discarded and written from scratch by professional software engineers that follow proper development and documentation procedures. We reminded management of this, and they decided not to make the necessary investment to add Linux on System z as a supported operating system for SAN file system.
In 2003, IBM launched Productivity Center, SAN File System and SAN Volume Controller. These would be lumped together with Horizon-1 product IBM Tivoli Storage Manager and the four products were promoted together as the inappropriately-named [TotalStorage Open Software Family]. We actually had long meetings debating whether SAN Volume Controller was hardware or software. While it is true that most of the features and functions of SAN Volume Controller is driven by its software, it was never packaged as a software-only offering.
The SAN File System was the productized version of the "Storage Tank" research project. While the SAN Volume Controller used industry standard Fibre Channel Protocol (FCP) to allow support of a variety of operating system clients, the SAN File System required an installed "client" that was only available initially on AIX and Linux-x86. In keeping with the "open" concept, an "open source reference client" was made available so that the folks at Hewlett-Packard, Sun Microsystems and Microsoft could port this over to their respective HP-UX, Solaris and Windows operating systems. Not surprisingly, none were willing to voluntarily add yet another file system to their testing efforts.
Barry argues that SANfs was five years ahead of its time. SAN File System tried to bring policy-based management for information, which has been part of DFSMS for z/OS since the 1980s, over to distributed operating systems. The problem is that mainframe people who understand and appreciate the benefits of policy-based management already had it, and non-mainframe couldn't understand the benefits of something they have managed to survive without.
(Every time I see VMware presented as a new or clever idea, I have to remind people that this x86-based hypervisor basically implements the mainframe concept of server virtualization introduced by IBM in the 1970s. IBM is the leading reseller of VMware, and supports other server virtualization solutions including Linux KVM, Xen, Hyper-V and PowerVM.)
To address the various concerns about SAN File System, the proof-of-concept code from IBM Research was withdrawn from marketing, and new fresh code implementing these concepts were integrated into IBM's existing General Parallel File System (GPFS). This software would then be packaged with a server hardware cluster, exporting global file spaces with broad operating system reach. Initially offered as IBM Scale-out File Services (SoFS) service offering, this was later re-packaged as an appliance, the IBM Scale-Out Network Attached Storage (SONAS) product, and as IBM Smart Business Storage Cloud (SBSC) cloud storage offering. These now offer clustered NAS storage using the industry standard NFS and CIFS clients that nearly all operating systems already have.
Today, these former Horizon-1 products are now Horizon-2 and Horizon-3. They have evolved. Tivoli Storage Productivity Center, GPFS and SAN Volume Controller are all market leaders in their respective areas.
“In times of universal deceit, telling the truth will be a revolutionary act.”
-- George Orwell
Well, it has been over two years since I first covered IBM's acquisition of the XIV company. Amazingly, I still see a lot of misperceptions out in the blogosphere, especially those regarding double drive failures for the XIV storage system. Despite various attempts to [explain XIV resiliency] and to [dispel the rumors], there are still competitors making stuff up, putting fear, uncertainty and doubt into the minds of prospective XIV clients.
Clients love the IBM XIV storage system! In this economy, companies are not stupid. Before buying any enterprise-class disk system, they ask the tough questions, run evaluation tests, and all the other due diligence often referred to as "kicking the tires". Here is what some IBM clients have said about their XIV systems:
“3-5 minutes vs. 8-10 hours rebuild time...”
-- satisfied XIV client
“...we tested an entire module failure - all data is re-distributed in under 6 hours...only 3-5% performance degradation during rebuild...”
-- excited XIV client
“Not only did XIV meet our expectations, it greatly exceeded them...”
In this blog post, I hope to set the record straight. It is not my intent to embarrass anyone in particular, so instead will focus on a fact-based approach.
Fact: IBM has sold THOUSANDS of XIV systems
XIV is "proven" technology with thousands of XIV systems in company data centers. And by systems, I mean full disk systems with 6 to 15 modules in a single rack, twelve drives per module. That equates to hundreds of thousands of disk drives in production TODAY, comparable to the number of disk drives studied by [Google], and [Carnegie Mellon University] that I discussed in my blog post [Fleet Cars and Skin Cells].
Fact: To date, no customer has lost data as a result of a Double Drive Failure on XIV storage system
This has always been true, both when XIV was a stand-alone company and since the IBM acquisition two years ago. When examining the resilience of an array to any single or multiple component failures, it's important to understand the architecture and the design of the system and not assume all systems are alike. At it's core, XIV is a grid-based storage system. IBM XIV does not use traditional RAID-5 or RAID-10 method, but instead data is distributed across loosely connected data modules which act as independent building blocks. XIV divides each LUN into 1MB "chunks", and stores two copies of each chunk on separate drives in separate modules. We call this "RAID-X".
Spreading all the data across many drives is not unique to XIV. Many disk systems, including EMC CLARiiON-based V-Max, HP EVA, and Hitachi Data Systems (HDS) USP-V, allow customers to get XIV-like performance by spreading LUNs across multiple RAID ranks. This is known in the industry as "wide-striping". Some vendors use the terms "metavolumes" or "extent pools" to refer to their implementations of wide-striping. Clients have coined their own phrases, such as "stripes across stripes", "plaid stripes", or "RAID 500". It is highly unlikely that an XIV will experience a double drive failure that ultimately requires recovery of files or LUNs, and is substantially less vulnerable to data loss than an EVA, USP-V or V-Max configured in RAID-5. Fellow blogger Keith Stevenson (IBM) compared XIV's RAID-X design to other forms of RAID in his post [RAID in the 21st Centure].
Fact: IBM XIV is designed to minimize the likelihood and impact of a double drive failure
The independent failure of two drives is a rare occurrence. More data has been lost from hash collisions on EMC Centera than from double drive failures on XIV, and hash collisions are also very rare. While the published worst-case time to re-protect from a 1TB drive failure for a fully-configured XIV is 30 minutes, field experience shows XIV regaining full redundancy on average in 12 minutes. That is 40 times less likely than a typical 8-10 hour window for a RAID-5 configuration.
A lot of bad things can happen in those 8-10 hours of traditional RAID rebuild. Performance can be seriously degraded. Other components may be affected, as they share cache, connected to the same backplane or bus, or co-dependent in some other manner. An engineer supporting the customer onsite during a RAID-5 rebuild might pull the wrong drive, thereby causing a double drive failure they were hoping to avoid. Having IBM XIV rebuild in only a few minutes addresses this "human factor".
In his post [XIV drive management], fellow blogger Jim Kelly (IBM) covers a variety of reasons why storage admins feel double drive failures are more than just random chance. XIV avoids load stress normally associated with traditional RAID rebuild by evenly spreading out the workload across all drives. This is known in the industry as "wear-leveling". When the first drive fails, the recovery is spread across the remaining 179 drives, so that each drive only processes about 1 percent of the data. The [Ultrastar A7K1000] 1TB SATA disk drives that IBM uses from HGST have specified 1.2 million hours mean-time-between-failures [MTBF] would average about one drive failing every nine months in a 180-drive XIV system. However, field experience shows that an XIV system will experience, on average, one drive failure per 13 months, comparable to what companies experience with more robust Fibre Channel drives. That's innovative XIV wear-leveling at work!
Fact: In the highly unlikely event that a DDF were to occur, you will have full read/write access to nearly all of your data on the XIV, all but a few GB.
Even though it has NEVER happened in the field, some clients and prospects are curious what a double drive failure on an XIV would look like. First, a critical alert message would be sent to both the client and IBM, and a "union list" is generated, identifying all the chunks in common. The worst case on a 15-module XIV fully loaded with 79TB data is approximately 9000 chunks, or 9GB of data. The remaining 78.991 TB of unaffected data are fully accessible for read or write. Any I/O requests for the chunks in the "union list" will have no response yet, so there is no way for host applications to access outdated information or cause any corruption.
(One blogger compared losing data on the XIV to drilling a hole through the phone book. Mathematically, the drill bit would be only 1/16th of an inch, or 1.60 millimeters for you folks outside the USA. Enough to knock out perhaps one character from a name or phone number on each page. If you have ever seen an actor in the movies look up a phone number in a telephone booth then yank out a page from the phone book, the XIV equivalent would be cutting out 1/8th of a page from an 1100 page phone book. In both cases, all of the rest of the unaffected information is full accessible, and it is easy to identify which information is missing.)
If the second drive failed several minutes after the first drive, the process for full redundancy is already well under way. This means the union list is considerably shorter or completely empty, and substantially fewer chunks are impacted. Contrast this with RAID-5, where being 99 percent complete on the rebuild when the second drive fails is just as catastrophic as having both drives fail simultaneously.
Fact: After a DDF event, the files on these few GB can be identified for recovery.
Once IBM receives notification of a critical event, an IBM engineer immediately connects to the XIV using remote service support method. There is no need to send someone physically onsite, the repair actions can be done remotely. The IBM engineer has tools from HGST to recover, in most cases, all of the data.
Any "union" chunk that the HGST tools are unable to recover will be set to "media error" mode. The IBM engineer can provide the client a list of the XIV LUNs and LBAs that are on the "media error" list. From this list, the client can determine which hosts these LUNs are attached to, and run file scan utility to the file systems that these LUNs represent. Files that get a media error during this scan will be listed as needing recovery. A chunk could contain several small files, or the chunk could be just part of a large file. To minimize time, the scans and recoveries can all be prioritized and performed in parallel across host systems zoned to these LUNs.
As with any file or volume recovery, keep in mind that these might be part of a larger consistency group, and that your recovery procedures should make sense for the applications involved. In any case, you are probably going to be up-and-running in less time with XIV than recovery from a RAID-5 double failure would take, and certainly nowhere near "beyond repair" that other vendors might have you believe.
Fact: This does not mean you can eliminate all Disaster Recovery planning!
To put this in perspective, you are more likely to lose XIV data from an earthquake, hurricane, fire or flood than from a double drive failure. As with any unlikely disaster, it is best to have a disaster recovery plan than to hope it never happens. All disk systems that sit on a single datacenter floor are vulnerable to such disasters.
For mission-critical applications, IBM recommends using disk mirroring capability. IBM XIV storage system offers synchronous and asynchronous mirroring natively, both included at no additional charge.
Of these, fellow blogger Marc Farley suggested for me "Tony Late for Dinner Pearson", which is fair, I guess, given that I often work late to make sure my blog posts are well written, and sometimes that means I am the last to leave the building.
Full Disclosure: I've known Marc for a while now, we have attended events together and even were co-speakers on a conference call for customers.
Perhaps more disturbing is that, for the most part, the storage blogosphere is entirely dominated by men. Where are the women bloggers for storage?
People are confused over various orders of magnitude. News of the economic meltdownoften blurs the distinction between millions (10^6), billions (10^9), and trillions (10^12).To show how different these three numbers are, consider the following:
A million seconds ago - you might have received your last paycheck (12 days)
A billion seconds ago - you were born or just hired on your current job (31 years)
A trillion seconds ago - cavemen were walking around in Asia (31,000 years)
That these numbers confuse the average person is no surprise, but that it confuses marketing people in the storage industry is even more hilarious. I am often correcting people who misunderstandMB (million bytes), GB (billion bytes) and TB (trillion bytes) of information.Take this graph as an example from a recent presentation.
At first, it looks reasonable, back in 2004, black-and-white 2D X-Ray images were only 1MBin size when digitized, but by 2010 there will be fancy 4D images that now take 1TB, representinga 1000x increase. What?When I pointed out this discrepancy, the person who put this chart together didn't know what to fix.Were 4D images only 1GB in size, or was it really a 1000000x increase.
If a 2D image was 1000 by 1000 pixels, each pixel was a byte of information, then a 3D imagemight either be 1000 by 1000 by 1000 [voxels], or 1000 by 1000 at 1000 frames per second (fps). Thefirst being 3D volumetric space, and the latter called 2D+time in the medical field, the rest of us just say "video".4D images are 3D+time, volumetric scans over time, so conceivably these could be quite large in size.
The key point is that advances in medical equipment result in capturing more data, which canhelp provide better healthcare. This would be the place I normally plug an IBM product, like the Grid Medical Archive Solution [GMAS], a blended disk and tape storage solution designed specifically for this purpose.
So, as government agencies look to spend billions of dollars to provide millions of peoplewith proper healthcare, choosing to spend some of this money on a smarter infrastructure can result in creating thousands of jobs and save everyone a lot of money, but more importantly, save lives.
Short 2-minute [video] argues the case for Smarter Healthcare
For more on this, check out Adam Christensen's blog post on[Smarter Planet], which points to a podcast byDr. Russ Robertson, chairman of the Counsel of Medical Education at Northwestern University’s Feinberg School of Medicine, and Dan Pelino, general manager of IBM's Healthcare and Life Sciences Industry.
Wrapping up this week's theme on ways to make the planet smarter, and less confusing, I present IBM's third annual [five in five]. These are five IBM innovations to watch over the next five years, all of which have implications on information storage. Here is a quick [3-minute video] that provides the highlights:
This week is Thanksgiving holiday in the USA, so I thought a good theme would be things I am thankful for.
I'll start with saying that I am thankful EMC has finally announcedAtmos last week. This was the "Maui" part of the Hulk/Maui rumors we heard over a year ago. To quickly recap, Atmos is EMC's latest storage offeringfor global-scale storage intended for Web 2.0 and Digital Archive workloads. Atmos can be sold as just software, or combined with Infiniflex,EMC's bulk, high-density commodity disk storage systems. Atmos supports traditionalNFS/CIFS file-level access, as well as SOAP/REST object protocols.
I'm thankful for various reasons, here's a quick list:
It's hard to compete against "vaporware"
Back in the 1990s, IBM was trying to sell its actual disk systems against StorageTek's rumored "Iceberg" project. It took StorageTek some four years to get this project out,but in the meantime, we were comparing actual versus possibility. The main feature iswhat we now call "Thin Provisioning". Ironically, StorageTek's offering was not commercially successful until IBM agreed to resell this as the IBM RAMAC Virtual Array (RVA).
Until last week, nobody knew the full extent of what EMC was going to deliver on the many Hulk/Maui theories. Severalhinted as to what it could have been, and I am glad to see that Atmos falls short of those rumored possibilities. This is not to say that Atmos can't reach its potential, and certainly some of the design is clever, such as offering native SOAP/REST access.
Instead, IBM now can compare Atmos/Infiniflex directly to the features and capabilities of IBM's Scale Out File Services [SoFS], which offers a global-scale multi-site namespace with policy-based data movement, IBM System Storage Multilevel Grid Access Manager[GAM] that manages geographical distrubuted information,and IBM [XIV Storage System] that offers high-density bulk storage.
Web 2.0 and Digital Archive workloads justify new storage architectures
When I presented SoFS and XIV earlier this year, I mentioned they were designed forthe fast-growing Web 2.0 and Digital Archive workloads that were unique enough to justify their own storage architectures. One criticism was that SoFS appeared to duplicate what could be achieved with dozens of IBM N series NAS boxes connected with Virtual File Manager (VFM). Why invent a new offering with a new architecture?
With the Atmos announcement, EMC now agrees with IBM that the Web 2.0 and DigitalArchive workloads represent a unique enough "use case" to justify a new approach.
New offerings for new workloads will not impact existing offerings for existing workloads
I find it amusing that EMC is quickly defending that Atmos will not eat into its DMXbusiness, which is exactly the FUD they threw out about IBM XIV versus DS8000 earlier this year. In reality, neither the DS8000 nor the DMX were used much for Web 2.0 andDigital Archive workloads in the past. Companies like Google, Amazon and others hadto either build their own from piece parts, or use low-cost midrange disk systems.
Rather, the DS8000 and DMX can now focus on the workloads they were designed for,such as database applications on mainframe servers.
Cloud-Oriented Storage (COS)
Just when you thought we had enough terminology already, EMC introduces yet another three-letter acronym [TLA]. Kudos to EMC for coining phrases to help move newconcepts forward.
Now, when an RFP asks for Cloud-oriented storage, I am thankful this phrase will help serve as a trigger for IBM to lead with SoFS and XIV storage offerings.
Digital archives are different than Compliance Archives
EMC was also quick to point out that object-storage Atmos was different from theirobject-storage EMC Centera. The former being for "digital archives" and the latter for"compliance archives". Different workloads, Different use cases, different offerings.
Ever since IBM introduced its [IBM System Storage DR550] several years ago, EMC Centera has been playing catch-up to match IBM'smany features and capabilities. I am thankful the Centera team was probably too busy to incorporate Atmos capabilities, so it was easier to make Atmos a separate offering altogether. This allows the IBM DR550 to continue to compete against Centera's existingfeature set.
Micro-RAID arrays, logical file and object-level replication
I am thankful that one of the Atmos policy-based feature is replicating individualobjects, rather than LUN-based replication and protection. SoFS supports this forlogical files regardless of their LUN placement, GAM supports replication of files and medical images across geographical sites in the grid, and the XIV supports this for 1MBchunks regardless of their hard disk drive placement. The 1MB chunk size was basedon the average object size from established Web 2.0 and DigitalArchive workloads.
I tried to explain the RAID-X capability of the XIV back in January, under muchcriticism that replication should only be done at the LUN level. I amthankful that Marc Farley on StorageRap coined the phrase[Micro-RAID array] to helpmove this new concept further. Now, file-level, object-level and chunk-level replication can be considered mainstream.
Much larger minimum capacity increments
The original XIV in January was 51TB capacity per rack, and this went up to 79TB per rack for the most recent IBM XIV Release 2 model. Several complained that nobody would purchase disk systems at such increments. Certainly, small and medium size businessesmay not consider XIV for that reason.
I am thankful Atmos offers 120TB, 240TB and 360TB sizes. The companies that purchasedisk for Web 2.0 and Digital Archive workloads do purchase disk capacity in these large sizes. Service providers add capacity to the "Cloud" to support many of theirend-clients, and so purchasing disk capacity to rent back out represents revenue generating opportunity.
Renewed attention on SOAP and REST protocols
IBM and Microsoft have been pushing SOA and Web Services for quite some time now.REST, which stands for [Representational State Transfer] allows static and dynamic HTML message passing over standard HTTP.SOAP, which was originally [Simple Object Access Protocol], and then later renamed to "Service Oriented Architecture Protocol", takes this one step further, allowingdifferent applications to send "envelopes" containing messages and data betweenapplications using HTTP, RPC, SMTP and a variety of other underlying protocols.Typically, these messages are simple text surrounded by XML tags, easily stored asfiles, or rows in databases, and served up by SOAP nodes as needed.
It's hard to show leadership until there are followers
IBM's leadership sometimes goes unnoticed until followerscreate "me, too!" offerings or establish similar business strategies. IBM's leadership in Cloud and Grid computing is no exception.Atmos is the latest me-too product offering in this space, trying pretty muchto address the same challenges that SoFS and XIV were designed for.
So, perhaps EMC is thankful that IBM has already paved the way, breaking throughthe ice on their behalf. I am thankful that perhaps I won't have to deal with as much FUD about SoFS, GAM and XIV anymore.
In his post on Rough Type titled ["McKinsey surveys the new software landscape"], Nick Carr discusses the growing acceptance in the marketplace for Software-as-a-Service, or SaaS.He summarizes the results of McKinsey's recent[Enterprise Software Customer Survey 2008].IBM is already well established as part of the Web 2.0 Big "5" (the other four are Google, Yahoo, Amazon and Microsoft), so it may not be much surprise that it introduced some new offerings focused on this emerging market.
For managed hosting, [IBM Managed Storage Services] hasbeen extended to support archive data through its entire lifecycle: supporting access, migration, non-erasablenon-rewriteable (NENR) protection, and expiration/destruction. This offering supports locating the storage onthe customer premises, a hosting center, or an IBM Service Deliver Center. IBM's blended disk and tape approachprovides a better alignment between information value and storage costs.
Last December IBM acquired Arsenal Digital, which offers a remote "Enterprise Email Archive" service, supporting retention policies that can apply per user,per group, or even my message, as needed. This service provides fast user access to email archives, as well as e-discovery search. The search is not just for the email body text, but supports over 370different attachment types as well. Deduplication technology is used to reduce the actual amount of storage needed by 80percent. All of this with the security and comfort of knowing that these email archives are encrypted and protected in a disaster recovery class datacenter managed by IBM.Blocks and Files presents their thoughts on this in the article["IBM storing data and mail in the cloud"].
The Radicati Group has published some interesting statistics about email archive in[Volume 4, Issue 3]. Here's an excerpt:
"In 2007, a typical corporate email account receives about 18 MB of data per day. This number is expected to grow to over 28 MB by 2011. Today, there is no way to effectively manage these messages, but with the help of an archiving solution.
Today, the worldwide percentage of corporate mailboxes protected by archiving solutions is estimated to be around 14%, however it is growing at a fast pace, and is expected to reach over 70% by 2011.
A survey of 102 corporate organizations worldwide, showed that 68% of large businesses view compliance as their top security concern in 2007."
For those who are actually providing these services to others, over the cloud, then you might want to use the new[IBM System x iDataPlex].Compared to traditional server environments, the iDataPlex provides five times the computing power by doubling the number of servers per rack, but with 40 percent less energy consumption. Thanks to clevercooling technology, the system can run in standard office "room temperature" environments. You cancustomize with a mix of compute, network and storage nodes to meet your application requirements.In addition to Web 2.0 and SaaS workloads, the iDataPlex can be useful for financial risk analysis,high performance computing, and even batch processing.
Whether you are looking to contract out for SaaS, or to provide a service to others over the cloud, IBM can help!
Tim Ferris started the festivities with [The Grand Illusion: The Real Tim Ferriss speaks]. He claimed that for the past year, he outsourced the writing of his blog to a writer from India, and an editor from the Philippines. Given that his post was dated March 31, and he writes frequently about the benefits of outsourcing, it appeared like a legitimate post. However, Tim fessed up the following day, claiming that it was April 1 in Japan where he wrote it.
Guy Kawasaki wrote[April Fools' Stories You Shouldn't Believe]including my favorite #12 "Ruby on Rails cited Twitter as the centerpiece of its new 'Rails Can Scale' marketing program." Speaking of Twitter, Fellow IBM blogger Alan Lepofsky from our Lotus Notes team wrote[Great, now there is Twitter Spam]. It looked like a real post, but then I realized, ... everything on Twitter is spam!
Topics like energy consumption and global warming were fodder for posts and pranks.The post[Was Earth Hour a joke again?], argued thatthe preparation of "Earth Hour" last week in effect used up more energy than the hour of this annual "lights-off event" actually saved. This reminded me of John Tierney's piece in the New York Times ["How virtuous is Ed Begley, Jr.?"] where a scientist explains that it is more "green" for the environment to drive a car short distances than to walk:
If you walk 1.5 miles, Mr. Goodall calculates, and replace those calories by drinking about a cup of milk, the greenhouse emissions connected with that milk (like methane from the dairy farm and carbon dioxide from the delivery truck) are just about equal to the emissions from a typical car making the same trip. And if there were two of you making the trip, then the car would definitely be the more planet-friendly way to go.
Wayan Vota, my buddy over at OLPCnews, writes in his post[Windows XO Child Centric Development] that the "Sugar" operating environment on the innovative Linux-based XO laptops will soon be re-named the"Windows XO Operating System", with their new motto "Windows XO: A Child-Centric Operating Platform for Learning, Expression and Exploration." The mocked up photo of an XO laptop with the Windows XO logo was excellent!
The economists from Freakonomics explain in [And While You're at it, Toss the Nickel] that it costs the US Government 1.7 cents to produce each penny. The US government loses $50 million dollars each year making pennies. Each nickel costs 10 cents to produce. This one was dated March 31, so it could actually be true. Sad, but true.
My favorite, however, was EMC blogger Barry Burke's post["5773 > c"] explaining howtheir scientists were able to reduce latency on the EMC SRDF disk replication capability:
What the de-dupe team found is that there is a hidden feature within recent generations of this chip that allow a single bit, under certain circumstances, to represent TWO bits of information.
Still, almost 34% of the total bits transferred were in fact aligned double-zeros, far more than all other bit combinations - and most importantly, these were quite frequently byte-aligned, as required by this new-found capability. Makes sense, if you think about it - most of those 32- and 64-bit integers are used to store numbers that are relatively small (years, months, days, credit charges, account balances, etc.). So that's why the team decided to use this new two-fer bit to represent "00".
Mathematically, if you can transmit 34% of the data using half as many bits, you reduce the number of bits you have to transfer in total by 17%. Which, while not necessarily earth-shattering, is nothing to be ashamed of. On top of the SRDF performance enhancements delivered in 5772 (30% reduction in latency or 2x the distance), this new enhancement adds another 17% latency improvement (or ~1.4x more distance at the same latency). Combined with 5772, SRDF/S customers could see a 50% reduction in latency. And 5773 allows SRDF/A cycle times to be set below 5 seconds (with RPQ) - this new feature adds a little headroom to maximize bandwidth efficiency for the shortest possible RPO.
Again, this looked real, until I did the math. Start with the speed of light in a vacuum of space ("c" in BarryB's title) which is roughly 300,000 kilometers per second, or put into more understandable units, 300 kilometers per millisecond. However, light travels slower through all other materials, and for fiber optic glass it is only 200 kilometers per millisecond. Sending a block of data across 100km, and then getting a response back that it arrived safely, is a total round-trip distance of 200km, so roughly 1 millisecond. However, EMC SRDF often takes two or three round-trips per write, versus IBM Metro Mirror on the IBM System Storage DS8000 which has got this down to a single round-trip. The number of round-trips has a much bigger effect on latency than EMC's double-bit data compression technique. With IBM, you only experience about 1 millisecond latency per write for every 100km distance between locations, the shortest latency in the industry.
It is good that once a year, you should be skeptical of what you read in the blogosphere, and sometimes check the facts!
I got some interesting queries about IBM's Scale-Out File Services [SoFS] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
At IBM, our standard is to have a limit of 200MB per user mailbox. A few of us get exceptions and have up to500MB limit because of the work we do. By comparison, my personal Gmail account is now up to 6500MB. Whenthis limit is exceeded, you are unable to send out any mail until it is brought down below the limit, and a request to be "re-enabled for send" is approved, a situation we call "mail jail".
The biggest culprit are attachments. Only 10 percent of emails have attachments, but those that do take up 90percent of the total space! People attach a 15MB presentation or document, and copy the world ondistribution list. Everyone saves their notes with these attachments, and soon, the limits are blown. Not surprisingly, deduplication has been cited as a "killer app" to address email storage, exactly for this reason.If all the users have their mailboxes all stored on the same deduplication storage device, it might find theseduplicate blocks, and manage to reduce the space consumed.
A better practice would be to avoid this in the first place. Here are the techniques I use instead:
Point to the document in a database
We are heavy users of Lotus Notes databases. These can be encrypted and controlled with Access Control Lists (ACL)that determine who can create or read documents in each database. Annually, all the database ACLs are validatedso that people can confirm that they continue to have a need-to-know for the documents in each database. Sendinga confidential document as a "document link" to a database entry takes only a few bytes, and all the recipientsthat are already on the ACL have access to that document.
Point to the document on a web page
If the document is available on an internal or external website, just send the URL instead of attaching the file.Again, this takes only a few bytes. We have websites accessible only to all internal employees, websites thatcan be accessed only by a subset of employees with special permissions and credentials based on their job role, and websites that are accessible to our IBM Business Partners.
In my case, if I happen to have a blog posting that answers a question or helps illustrate an idea, I will sendthe "permalink" URL of that blog post in my email.
Point to the document on shared NAS file system
Internally, IBM uses a "Global Storage Architecture" (GSA) based on IBM's Scale-Out File Services [SoFS] with everyone getting initially 10GB of disk space to store files, with the option to request more if needed. The system has policy-based support for placing and migrating older data to tape to reduce actual disk usage, and combines a clustered file system with a global name space.
My SoFS space is now up to 25GB, and I store a lot of presentationsand whitepapers that are useful to others. A URL with "ftp://" or "http://" is all you need to point to a filein this manner, and greatly reduces the need for attachments. I can map my space as "Drive X:" on my Windows system,or as a NFS mount point on my Linux system, which allows me to easily drag files back and forth.
Departments that don't need to offer "worldwide access" use NAS boxes instead, such as the IBM System Storage N series.
Pointing to files in a shared space, rather than as attachments in email, may take some getting used to. I've hada few recipients send me requests such as "can you send that as an attachment (not a URL)" because they plan toread it on the airplane or train, where they won't have online connectivity.
"Have you invested in the latest and greatest in collaboration technology but still feel people are still not collaborating? How many Microsoft Sharepoint servers and IBM Quickplaces remain relatively untouched or only used by the organization's technorati? I think it's a big problem because this narrow view of collaboration starts to get the concept a bad name: "yeah, we did collaboration but no one used it." And then there the issue of the vast amount of money wasted and opportunities lost. We can't afford to loose faith in collaboration because the external environment is moving in a direction that mandates we collaborate. The problems we face now and into the future will only increase in complexity and it will require teams of people within and across organizations to solve them."
Well, sending pointers instead of attachments works for me, and has kept me out of "mail jail" for quite some timenow.
In addition to creating the Dilbert cartoon, Scott Adams has a blog, which sometimes is quite serious,and other times quite funny. The anticipated 30x cost of "Flash Drives" for Enterprise disk systems reminded meof one of Scott's articles from November 2007 titled [Urge to Simplify].Here's an excerpt:
Now the casinos have people trained, like chickens hoping for pellets, to take money from one machine (the ATM), carry it across a room and deposit in another machine (the slot machine). I believe B.F. Skinner would agree with me that there is room for even more efficiency: The ATM and the slot machine need to be the same machine.
The casinos lose a lot of money waiting for the portly gamblers with respiratory issues to waddle from the ATM to the slot machines. A better solution would be for the losers, euphemistically called “players,” to stand at the ATM and watch their funds be transferred to the hotel, while hoping to somehow “win.” The ATM could be redesigned to blink and make exciting sounds, so it seems less like robbery.
I’m sure this is in the five-year plan. Longer term, people will be trained to set up automatic transfers from their banks to the casinos. People will just fly to Vegas, wander around on the tarmac while the casino drains their bank accounts, then board the plane and fly home. The airlines are already in on this concept, and stopped feeding you sandwiches a while ago.
Perhaps EMC can redesign its DMX-4 to "blink and make exciting sounds" as well. The Flash Drives were designedfor the financial services industry, so those disk systems could be directly connected to make transfers between the appropriate bank accounts.
I'll love to hear from you (I post letters from authors!) about how you put the blook together. Many folks have used cut and paste from blog page into word processor. Others have simply backed up their blogs, then cut and pasted. Some folks had the foresight to compose their posts in a word processor before posting!
Anyway, I'd like to know whatever ins and outs you'd like to share. Thanks.
Well Cheryl, I couldn't find any email address to send you a response, so Idecided to post here instead and post a traceback on your blog.
Software: Office 2003 version of Microsoft Word on Windows XP system
Front matter: Title, Copyright, Dedication, Table of Contents, Foreword, Introduction
Back matter: Blog Roll, Blogging Guidelines, Glossary, Reference table, What people have written about me and my blog
According to Lulu, you could use OpenOffice instead with RTF files. I didn't try that. I did tryusing CutePDF to upload ready-made PDFs, that didn't work. I also tried saving text in PDF formaton my Mac Mini running OS X 10.4 Tiger, but Lulu didn't like that either.IBM now offers a free download of [LotusSymphony] that might be an alternative for my next book.
For my blook, the "Blog Roll" serves instead of a more formal [Bibliography]. I could have also includedonline magazines and other web resources.
Decision 2: Chapter Configuration
I reviewed other blooks to see how they were organized. I thought I might organize the blog posts by topic or category, but all the blooks I looked atwere strictly chronological, oldest post first. This of course is exactly opposite as theyappear on the web browser. I decided to keep things simple, with just 12 chapters, one for each calendar month.
Each chapter was separated by a section break with unique footers, starting on odd page number. The footers have the page numbers on the outside edges, so that even pages had numbers on the left, and odd pages on the right. I also added the name of the chapter and the book, like so:
--------------------------------| |---------------------------- 40 ................December 2006| |Inside System Storage.... 41
This was a lot of work, but makes the book look more "professional".
Decision 3: Cut-and-Paste
People have asked me why it took three months to put my blook together, and I explainedthat the cut-and-paste process was manually intensive. My posts are either HTML entereddirectly into Roller webLogger, or typed in HTML on Windows Notepad and cut-and-pastedover to Roller later. I have access to the HTML source of each post, as wellas how it appears on the webpage, and tried cut-and-paste both ways. Copying theHTML source meant having to edit out all the HTML tags. I hadn't even looked into the idea of "backing up" through Roller all the entries, but they would probably have been HTMLsource as well.
In turned out that copying the webpage directly from the browser was better, which retains more of the formatting,and automatically eliminates all of the pesky HTML tags. I wanted the printed versions to resemblethe web page version.
Microsoft Word indicates all hyperlinks as bright blue underlined text which I didn't like, so I removedall hyperlinks, to avoid having to pay extra for "colored pages". This can be done manually, one by one, or pasting with the "text only" option butthis removes out all the other formatting as well. (Specifying black-and-white interior on Lulu might have converted all of these automaticallyto greyscale, so I might have been safe to leave them in,which I probably could have done if I wanted an online e-book version with links active, ... oh well)
To indicate where the hyperlinks would have been, I wrapped all the linked text in[square brackets]. I have now gotten in the habit of doing this for future blog posts, soif I ever make another book, it will cut down the work and effort on the cut-and-paste.
Some of the items I linked to posed a problem. I had to convert YouTube videos to flat imagesof the first frame to include them into the book. Older links were broken, and I had tofind the original graphics. I also sent a note to Scott Adams related about the use of one of his Dilbert cartoons.
I decided to also cut-and-paste my technorati tags and comments. For comments I mademyself, I labeled them "Addition" or "Response". A few people did not realize thatI was "az990tony" making the comments as the blog author, so I changed all to say "az990tony (Tony Pearson)" to make this more clear, and now do this on all future blogposts to minimize the work for my next book.
Because I used a lot of technical terms and acronyms, Microsoft Word actually gave mean error message that there were so many gramattical and spelling errors that it wasunable to track them all, and would no longer put wavy green or red lines underneath.
I did all the cut-and-paste work myself, but since the website is publicly accessible,I could have gotten someone else to do this for me.Had I read Timothy Ferriss' book The Four Hour Work Week sooner,I might have taken his advice on [Outsourcing the project to someone in India]. I might consider doing this for my next book.
Decision 4: Numbering the Posts
I decided I wanted to standardize the title of each post. The date was not uniqueenough, as there were days that I made multiple posts. So, I decided to assign eacha unique number, from 001 to 165, like so:
2006 Dec 12 - The Dilemma over future storage formats (033)
Posts that referred back to one of my earlier posts within the book had (#nnn) added so that readers couldgo jump back to them if they were interested. This eliminated trying to keep track of pagenumbers.
Decision 5: Adding behind-the-scenes commentary
One of the reasons I rent or buy DVDs is for the director's audio commentary and deleted scenes. These extras provided that added-value over what I saw in the movietheatre. Likewise, 80 percent of a blook is already out in the public for reading, so I felt I needed to provide some added value. At the beginning of each month, I describewhat is going on behind the scenes, and then in front of specific posts, I providedadditional context. This could be context of what was going on in the blogosphere at thetime, announcements or acquisitions that happened, what country I was blogging from, orwhat unannounced products or projects that were being developed that I can now talk aboutsince they are now announced and available.
To distinguish these side comments from the rest of the blog posts,I decorated them with graphics. Searching for copyright-free/royalty-free clip-art, graphics, and photos that represented eachconcept was time-consuming. I shrunk each down to about 1 inch square in size, and changed themfrom color to greyscale. (LuLu conversion to PDF probably would have automaticallyconverted the color graphics to greyscale for me, in which case leaving them in full colormight have been nice for an e-book edition, ... oh well)
I did complete each chapter one at a time. So, for each month, I cut-and-pasted all the blog posts,tags and comments, then fixed up and numbered all the post titles, then added all the behindthe scenes commentary, and cleaned up all the font styles and sizes. I recommend you do this at least for the first chapter, so you can get a good feel for what the finished version will look like.
Decision 6: Adding a Glossary
I sent early copies of the books to five of my coworkers knowledgeable about storage, andfive local friends who know nothing about storage.
Some of my early reviewers suggested having an index, so that people can find a specific poston a particular topic. Others suggested I spell out all the acronyms that appear everywhereand put that into the Reference section, rather than on each and every occurrence inthe book itself. Both were good ideas, and my IBM colleague Mike Stanek suggested calling ita GOAT (Glossary of Acronyms and Terms). Acronyms are spelled out, and terms or phrasesthat need additional explanation have a glossary definition. For eachitem, I put the post or posts that uses that term. Some terms are covered in dozens ofposts, so I tried to pick five or fewer posts representing the most pertinent.
The glossary was far more time-consuming than I first imagined, with over 50 pages containingover 900 entries. I struggled deciding which terms and acronyms needed explanation, and which were obvious enough. On the good side, itforced me to read and re-read the entire book cover to cover, and I caught a lot of othermistakes, misspellings, and formatting errors that way. Also, I have a large internationalreadership on my blog, so the glossary will help those whose English is not their native language,and will help those readers who are not necessarily experts in the storage industry.
Decision 7: Designing the Covers
Up to this point, I had been printing early drafts with simple solid color covers. Lulu hasthree choices for covers:
Just type in the text, upload an "author's photo" and chose a background color or pattern
Upload PNG files, one for the front cover, one for the back cover, and chose the textand color of the spine.
Upload a single one-piece PDF file that wraps around the entire book.
I had no software to generate the PDF for the third option, so I decided to try the secondoption. My first attempt was to format the front title page in WORD, capture the screen,convert to PNG and upload it as the front cover. I did same for the back cover, with a smallpicture of me and some paragraphs about the book.
I chose a simple straightforward title on purpose. Thousands of IBM and other IT marketing and technicalpeople will be ordering this book, and submitting their expenses for reimbursement as work-related, and didn't want to cause problems with a cute title like "An Engineer in Marketing La-La Land".
The next step was to use [the GIMP] GNU image manipulationprogram, similar to PhotoShop, to add a cream colored background, a slanted green spine, and some graphics that we had developed professionally for some of our IBM presentations.I learned how to use the GIMP when making tee-shirts and coffee mugs for our [Second Life] events, so I was already familiar. For newblook authors, I suggest they learn how to use this for their covers, or find someone who can do thisfor them.
I did the paperback version first, and once done, it was easy to use the same PNG files forthe dust jacket of the hardcover edition, adding some extra words for the front and back flaps.
The adage "Don't judge a book by its cover" seems to apply to everything except booksthemselves. The book cover is the first impression online, and in a bookstore. I have seenpeople pick books up off the shelf at my local Barnes & Noble, read the front and back covers, peruse the front and backflaps, and make a purchase decision without ever flipping a single page of the contents inside.From an article on Book Catcher [SELF-PUBLISHING BOOK PRODUCTION & MARKETING MISTAKES TO AVOID]:
According to selfpublishingresources website, three-fourths of 300 booksellers surveyed (half from independent bookstores and half from chains) identified the look and design of the book cover as the most important component of the entire book. All agreed that the jacket is the prime real estate for promoting a book.
While many struggle to find the right title and cover art, I think it is interesting that Lululets you post the same book with slightly different titles and covers, each as separate projects, and let market forces decide which one people like best. This is a common practice among marketresearch firms.
Decision 8: Finding someone to write the Foreword
With the book nearly done, I thought it would be a nice touch to have an IBM executive write a Foreword at the frontof the book. Several turned me down, so I am glad I found a prominent Worldwide IBM executiveto do it. I should have started this process sooner, as she wanted to read my book in its entirety beforeputting pen to paper. I had not planned for this. I was hoping to be done by end of October,but waiting for her to finish writing the Foreword added some extra weeks. Next time,I will start this process sooner.
Decision 9: Printing Early Drafts
You need to have Lulu print at least one copy to review before making it available to the public,and it doesn't hurt to order a few intermediary draft copies to make sure everything looks right.However, from the time I order it on Lulu, to the time it is in my hands, is over two weeks withstandard shipping, so I needed a way to print drafts to look at in between.
To avoid wear-and-tear on my color ink-jet printer, I went and bought a large black-and-white[Brother HL-5250DN] laser printer. Rather than buying specialty 6x9 paper, I used standard 8.5x11 paperusing the following 2-up duplex method:
Upload the DOC file to Lulu, and get it converted to PDF
Download the resulting PDF from Lulu back to your computer
View the PDF in Adobe Reader, and print it using 2-up "Booklet" mode.
For example, if you print 60 pages in booklet mode, it prints two mini-pages on thefront side, and two more mini-pages on the back side of each sheet of paper, resulting in 15 standard 8.5" x 11" pages that can be folded, stapled, and read like a mini-booklet. My entire blook could be printed on seven of these mini-booklets, saving paper, and giving me a close approximation to what the final book would look like. Eachmini-page is 5.5"x8.5", so just slightly smaller than the final 6"x9" form factor.I fount that 60 pages/15 sheets was about the maximum before it becomes hard to fold in half.
So, if I had to do it all over again, I might have chosen 11pt Garamond (the default), or changedthe default to 11pt Book Antiqua up front, so as not to have spend so much time converting thefonts. I might have left out the glossary. I might have left in all the hyperlinks and graphicsin full color for a separate e-book edition. And I definitely would have looked for an author formy Foreword much earlier in the process.
I didn't plan to write a blook when I started blogging. I have started putting [square brackets]around all my links. I have started putting "az990tony (Tony Pearson)" on all my comments. I hadassumed that people were jumping to all the links I provided in context, but I learned that the blogpost has to stand on its own, so now I make sure that I either paraphrase the important parts, oractually quote the text that I feel is important, so that the blog post makes sense on its own.This is perhaps good advice in general, but even more important if you plan to write a blook later.
Lastly, I decided up front to write blog posts that were 500-700 words long, about the average lengthof magazine or newspaper articles. In my blook, the average is 639 words per post, so I hit thatgoal. I have seen some blogs where each post is just a few sentences. Maybe they are posting fromtheir cell phone, or don't have time to think out a full thought, but who wants to read a year'sworth of [twitter] entries.
Well Cheryl, I hope that helps. If you need anymore, click on the "email" box on the right panel.
Continuing my week's theme on Innovations that matter, I thought I would tackle energy efficiency and the recent excitement over the Smart car.
USA Today had an article [America crazy about breadbox on wheels called Smart car]. This car weighs only 2400 pounds, gets a respectable 33 MPG City,and 40 MPG Highway, with a list price of $11,590 US dollars. These have been in Europe for some time now.The "Smart" name comes from combining the S from Swatch, the M from Mercedes and ART. The car was designed byNicholas Hayek, founder of the SWATCH wristwatch line, and manufactured by Daimler, who also makes Mercedes cars.
We have many communities here in Tucson that people drive street-legal golf carts. People don't realize but bothelectric and electric/gas hybrid golf carts have been around for a long time. Some of the nicer golf carts run forabout $7,000 US dollars, with a shelf on the back that can hold two sets of golf clubs, or groceries.Of course, you would never take a golf cart on the highway, so that is where the Smart car comes in, with a 10gallon tank, could easily get you from one major city to another.
Like golf carts, the Smart-for-Two model being sold in the US will hold only two people, which is perfect for manyAmerican families. The standard 4-person or 5-person sedan is too big for most DINKS (Dual Income, No Kids), and other families with kids often opt for the 7-person SUV instead.
It is good to see that energy consumption is finally getting the attention it deserves. IBM recently announced some exciting offerings to help data centers manage their energy consumption:
IBM Systems Director Active Energy Manager V3.1 [AEM]:
A new, key component of IBM's [Cool Blue portfolio] offering, AEM helps clients manage and even potentially lower energy costs. According to Gartner, insufficient power and excessive heat remain the greatest challenges in the data center. With AEM, IT managers can understand exact power/cooling costs, manage the efficiency of the current environment and reduce energy costs. AEM is the only energy management software tool that can provide clients with a single view of the actual power usage across multiple IBM platforms, including x86, blades, Power and storage systems, with plans to extend support to the mainframe.
IBM Usage and Accounting Manager Virtualization Edition V7.1 [UAV]for System p and System x:
UAV gives IT managers more information to manage data center costs. These powerful usage management tools are designed to accurately measure, analyze, and report resource utilization of virtualized/consolidated/shared resources. With UAV, IT managers can better manage costs and justify new systems by determining who is using how much of which resource; assessing the cost of an IT service or application; and accurately charging each user or department. Working with AEM capabilities, it will also allow tracking of energy consumption costs by server and by user. This level of reporting eliminates a key inhibitor to the adoption of virtualization and consolidation and further differentiates IBM systems.
This solution -- ideal for heterogenous IT shops -- serves as an accurate measurement tool underlying billing processes and SLA compliance. UAM provides usage-based accounting and charging for virtually any IT resources across the enterprise -- ranging from mainframes to virtualized servers to storage networks and more. The Usage and Accounting Manager Virtualization offerings seamlessly integrate into it.
Whether you are trying to reduce energy consumption in your data center, or in your transportation around town, these innovations can help you stay "green".
A few weeks ago, my Tivo(R) digital video recorder (DVR) died. All of my digital clocks in my house were flashing 12:00 so I suspect it wasa power strike while I was at the office. The only other item to die was the surge protector,and so it did what it was supposed to do, give up its own life to protect the rest of myequipment. Although somehow, it did not protect my Tivo.
I opened a problem ticket with Sony, and they sent me instructions on how to send itover to another state to get it repaired.Amusingly, the instructions included "Please make a backup of the drive contents beforesending the unit in for repair." Excuse me? How am I supposed to do that, exactly?
My model has only a single 80GB drive, and so my friend and I removed the drive and attachedit to one of our other systems to see if anything was salvageable. It failed every diagnostictest. There was just not enough to read to be usable elsewhere.
This is typical of many home systems. They are not designed for robust usage, high availability, nor any form of backup/recovery process. Some of the newer models havetwo drives in a RAID-1 mode configuration, but most have many single points of failure.
And certainly, it is not mission critical data. Life goes on without the last few episodesof Jack Bauer on "24", or the various Food Network shows that I recorded for items I planto bake some day. For the past few weeks, I have spent more time listening to the radioand reading books. Somehow, even though my television runs fine without my Tivo, watchingTV in "real time" just isn't the same.
I suspect that if you gave someone a method to do the backup, most would not bother to useit. People are now relying more and more heavily on their home-basedinformation storage systems, digital music, video and cherished photographs. Perhaps experiencing a "loss" will help them appreciate backup/recovery systems so much more than they do today.
(Chris doesn't actually name who is his source making such a claim, whether thatsomeone was employed by any of the parties involved at the time the events occurred,or is currently employed by a competitor like EMC bitterly jealous of the success IBM and HDScurrently enjoy with their offerings.)
As I already posted before about IBM'slong history of storage virtualization, SAN Volume Controller was really part of a sequence of major product in this area, after the successful 3850 MSS and 3494 VTS block virtualization products.
In the late 1990's, our research teams in Almaden, California and Hursley, UK were exploring storagetechnologies that could take advantage of commodity hardware parts and the industry-leadingLinux operating system.
As is often the case, while IBM was working on "the perfect product", small start-ups announce "not-yet-perfect" products into the marketplace. Tactical moves like partneringwith DataCore was a smart move, for the following reasons:
Helps identify market segments. Identify which subset of customers would most benefit fromdisk virtualization. While our 3850 MSS and 3494 VTS were focused on mainframe customers, this newtechnology was focused on distributed Unix, Windows and Linux servers.
Helps prioritize market requirements. What are the most appealing features?What drives clients to buy disk virtualization for distributed systems platforms?
Helps evaluate packaging options. Should we deliver pure software and expect customersto purchase their own servers? Should we offer this as a "service offering" with installation anddeployment services included? Should we offer this as hardware with software pre-installed?
The partnership proved worthwhile, not just to prove to IBM that this was a worthwhile market to enter, but also how "NOT" to package a solution. Specifically, DataCore SANsymphony was software that you had to install on your own Windows-based server. The client was left with the task of orderinga suitable Intel-based server, with the right amount of CPU cycles, RAM and host bus adapter ports,and configure the Windows operating system and DataCore software.
It didn't go well. Basically, customers were expected to be their own "hardware engineers", having to knowway too much about storage hardware and software to design a combination that worked for theirworkloads. Most clients were disappointed with the amount of effort involved, and the resulting poor performance.
To fix this, IBM delivered the SAN Volume Controller, with an optimized Linux operating system and internally-writtensoftware that runs on IBM System x(tm) server hardware optimized for performance.
I can't speak for HDS, but I suspect they came to similar conclusions that resulted in a similar decisionto build their product in-house. I welcome Hu Yoshida to correct me if I am wrong on this.
Federal Rules for Civil Procedures (FRCP) will increase adoption of unstructured data classification, email archive systems and CAS.
CAS continues to flounder, but the rest I can agree with. Regulations are being adopted world wide. Japan has its own Sarbanes-Oxley (SOX) style legislation go into effect in 2008.IBM TotalStorage Productivity Center for Data is a great tool to help classify unstructured file systems. IBM CommonStore for email supports both Microsoft Exchange and Lotus Domino, and can be connected to IBM System Storage DR550 for compliance storage.
Unified storage systems (combined file and block storage target systems) will become increasingly attractive in 2007, because of their ease of use and simplicity.
I agree with this one also. Our sales of IBM N series in 2006 was great, and looking to continue its strong growth in 2007. The IBM N series brings together FCP, iSCSI and NAS protocols into one disk system. With the SnapLock(tm) feature, N series can store both re-writable data, as well as non-erasable, non-rewriteable data, on the same box. Combine the N series gateway on the front-end with SAN Volume Controller on the back-end, and you have an even more powerful combination.
Distributed ROBO backup to disk will emerge as the fastest growing data protection solution in 2007.
IDC had a similar prediction for 2006. ROBO refers to "Remote Office/Branch Office", and so ROBO backup deals with how to back up data that is out in the various remote locations. Do you back it up locally? or send it to a central location?Fortunately, IBM Tivoli Storage Manager (TSM) supports both ways, and IBM has introduced small disk and tape drives and auto-loaders that can be used in smaller environments like this. I don't know whether "backup to disk" will be the fastest growing, but I certainly agree that a variety of ROBO-related issues will be of interest this year.
2007 will be remembered as the year iSCSI SAN took off because of the much reduced pricing for 10 Gbit iSCSI and the continued deployment of 10 Gbit iSCSI targets.
While I agree that iSCSI is important, I can't say 2007 will be remembered for anything.We have terrible memory in these things. Ask someone what year did Personal Computers (PC) take off, and they will tell you about Apple's famous 1984 commercial. Ask someone when the Internet took off, cell phones took off, etc, and I suspect most will provide widely different answers, but most likely based on their own experience.
For the longest time, I resisted getting a cell phone. I had a roll of quarters in my car, and when I needed to make a call, I stopped at the nearby pay-phone, and made the call. In 1998, pay phones disappeared. You can't find them anymore. That was the year of the cell phones took off, at least for me.
Back to iSCSI, now that you can intermix iSCSI and SAN on the same infrastructure, either through intelligent multi-protocol switches available from your local IBM rep, or through an N series gateway, you can bring iSCSI technology in slowly and gradually. Low-cost copper wiring for 10 Gbps Ethernet makes all this very practical.
Another up-and-coming technology is AoE, or ATA-over-Ethernet. Same idea as iSCSI, but taken down to the ATA level.
CDP will emerge as an important feature on comprehensive data protection products instead of a separate managed product.
Here, CDP stands for Continuous Data Protection. While normal backups work like a point-and-shoot camera, taking a picture of the data once every midnight for example. CDP can record all the little changes like a video camera, with the option to rewind or fast-forward to a specific point in the day. IBM Tivoli CDP for Files, for example, is an excellent complement to IBM Tivoli Storage Manager.
The technology is not really new, as it has been implemented as "logs" or "journals" on databases like DB2 and Oracle, as well as business applications like SAP R/3.
The prediction here, however, relates to packaging. Will vendors "package" CDP into existing backup products, possibly as a separately priced feature, or will they leave it as a separate product that perhaps, like in IBM's case, already is well integrated.
The VTL market growth will continue at a much reduced rate as backup products provide equivalent features directly to disk. Deduplication will extend the VTL market temporarily in 2007.
VTL here refers to Virtual Tape Library, such as IBM TS7700 or TS7510 Virtualization Engine. IBM introduced the first one in 1997, the IBM 3494 Virtual Tape Server, and we have remained number one in marketshare for virtual tape ever since. I find it amusing that people are now just looking at VTL technology to help with their Disk-to-Disk-to-Tape (D2D2T) efforts, when IBM Tivoli Storage Manager has already had the capability to backup to disk, then move to tape, since 1993.
As for deduplication, if you need the end-target box to deduplicate your backups, then perhaps you should investigatewhy you are doing this in the first place? People take full-volume backups, and keep to many copies of it, when a more sophisticated backup software like Tivoli Storage Manager can implement backup policies to avoid this with a progressive backup scheme. Or maybe you need to investigate why you store multiple copies of the same data on disk, perhaps NAS or a clustered file system like IBM General Parallel File System (GPFS) could provide you a single copy accessible to many servers instead.
The reason you don't see deduplication on the mainframe, is that DFSMS for z/OS already allows multiple servers to share a single instance of data, and has been doing so since the early 1980s. I often joke with clients at the Tucson Executive Briefing Center that you can run a business with a million data sets on the mainframe, but that there wereprobably a million files on just the laptops in the room, but few would attempt to run their business that way.
Optical storage that looks, feels and acts like NAS and puts archive data online, will make dramatic inroads in 2007.
Marc says he's going out on a limb here, and that's good to make at least one risky prediction. IBM used to have anoptical library emulate disk, called the IBM 3995. Lack of interest and advancement in technology encouraged IBM to withdraw it. A small backlash ensued, so IBM now offers the IBM 3996 for the System p and System i clients that really, really want optical.
As for optical making data available "online", it takes about 20 seconds to load an optical cartridge, so I would consider this more "nearline" than online. Tape is still in the 40-60 second range to load and position to data, so optical is still at an advantage.
Optical eliminates the "hassles of tape"? Tape data is good for 20 years, and optical for 100 years, but nobody keeps drives around that long anyways. In general, our clients change drives every 6-8 years, and migrate the data from old to new. This is only a hassle if you didn't plan for this inevitable movement. IBM Tivoli Storage Manager, IBM System Storage Archive Manager, and the IBM System Storage DR550 all make this migration very simple and easy, and can do it with either optical or tape.
The Blue-ray vs. DVD debate will continue through 2007 in the consumer world. I don't see this being a major player in more conservative data centers where a big investment in the wrong choice could be costly, even if the price-per-TB is temporarily in-line with current tape technologies. IBM and others are investing a lot of Research and Development funding to continue the downward price curve for tape, and I'm not sure that optical can keep up that pace.
Well, that's my take. It is a sunny day here in China, and have more meetings to attend.
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
I would fall into the "not for me" category, at least at this time. The iPhone is GSM-capable phone with the ability to store 4GB or 8GB of music, photos and video, and has incorporated a 2 megapixel camera. Currently, I have separate components:
A cell phone that is GSM plus CDMA, with features like "speakerphone" which I use quite a lot, but NO camera.
A 7 megapixel camera, also very small, with removable memory cards.
A 60GB iPod, with music and photos. My model is older and doesn't handle videos.
Since I visit government agencies, research and development labs, and other places that don't allow cameras, I have to either chose a cell phone that does not have camera capability in it, or have a camera phone that I leave behind in the car or at the front desk. I have chosen to get cell phones with NO camera. So, NOT having a camera is a primary feature I look for, but this is getting harder and harder these days. I don't know if Apple plans to have a non-camera version of their iPhone, but that would be a deal-breaker for me.
I do carry a separate camera, and where it is permissible, use it separately. This is especially useful if you do a lot of whiteboard or flipchart presentations, and want to capture what you have written for later. (For a great example of how effectively whiteboards can be used, check out these videos from UPS.)A picture is worth a thousand words, and is easier to convey an idea with pictures, especially in countries that may not speak English. Last month, I got a 7 megapixel camera to replace my 5 megapixel. For my work, 2 megapixel as found in the iPhone is not detailed enough.
As for my iPod, I enjoy that I can carry 60GB of music and photos. When I go on vacations, I can bring my camera and iPod, and connect the two, transferring and viewing the pictures that I take. I can easily free up 5-10 GB of space on my iPod for photos in preparation for a trip, then replace that with music when I am back at home. I also use my iPod as a remote disk drive for my laptop on business trips. Again, the 4GB and 8GB may not be enough for what I need.
Printers were never converged into Personal Computers, but they did have their own convergence. I have a multi-function printer/scanner/fax machine. I used to have separate printer, scanner and fax machines, but now the technology is so inexpensive that it got all combined into one solution.
The same is happening for Storage Area Networking gear.
Thanks to Fibre Channel, switches and directors can handle both SCSI commands (FCP) and CCW commands (FICON). This allows the mainframe and distrbuted systems to converge their traffic onto a single network, and is less expensive than trying to maintain one network for the mainframes, and another for the distributed platforms.
On the SCSI side, there are now switches that let you have pluggable ports of different flavors. For example, you can have some ports be Fibre Channel to receive FCP, and other ports to be Ethernet to carry iSCSI. iSCSI is a protocol co-developed between IBM and Cisco to carry SCSI commands over Ethernet. Since most computers already have Ethernet "network interface cards" and most buildings are already wired with an Ethernet infrastructure, this provides a less expensive alternative to Fibre Channel.
Routers, and combination Router/Switches, can send all the FCP/FICON/iSCSI traffic over various long distances to remote data centers, using either iFCP or FCIP protocols. This is a less expensive alternative to dropping your own private "dark fiber" between the two locations, which often involves negotiating access rights to dig trenches through other people's property.
Which brings me back to Apple's iPhone. One device can make calls, watch video, and download webpages all because the networks have converged into sending all data in "packets". The network just routes packets from one place to another. It doesn't care that a packet is a voice packet, a video packet or a webpage packet. It doesn't matter.
"Users can pay for groceries and other purchases by swiping a phone over a reader that electronically communicates with a microchip on the phone. Phone owners confirm the purchase with the push of a button and the deal is complete.
The platform is the result of many years of trials around the world and will enable mobile contactless payments, remote payments, person-to-person payments, and mobile coupons."
For those of us in the northern hemisphere, yesterday was this year's Winter Solstice, representingthe shortest amount of daylight between sunrise and sunset. So today, I thought I would blog on my thoughtsof managing scarcity.
Earlier in my career, I had the pleasure to serve as "administrative assistant" to Nora Denzel for the week at a storage conference. My job was to make her look good at the conference, which if you know Nora, doesn't take much. Later, she left IBM to work at HP, and I gotto hear her speak at a conference, and the one thing that I remember most was her statement that thewhole point of "management" was to manage scarcity, as in not enough money in the budget,not enough people to implement change, or not enough resources to accomplish a task.(Nora, I have no idea where you are today, so if you are reading this, send me a note).
Of course, the flip-side to this is that resources that are in abundance are generallytaken for granted. Priorities are focused on what is most scarce. Let's examine some of theresources involved in an IT storage environment:
Capacity - while everyone complains that they are "running out of space", the truth is that most external disk attached to Linux, UNIX, or Windows systems contain only 20-40% data. Many years ago, I visitedan insurance company to talk about a new product called IBM Tivoli Storage Manager. This company had 7TB of disk on their mainframe,and another 7TB of disk scattered on various UNIX and Windows machines. In the room were TWO storage admins for
the mainframe, and 45 storage admins for the distributed systems. My first question was "why so many people forthe mainframe, certainly one of you could manage all of it yourself, perhaps on Wednesday afternoons?" Their response was that they acted as eachother's backup, in case one goes on vacation for two weeks. My follow-up question to the rest of the audience was:"When was the last time you took two weeks vacation?" Mainframes fill their disk and tape storage comfortablyat over 80-90% full of data, primarily because they have a more mature, robust set of management software, likeDFSMS.
Labor - by this I mean skilled labor able to manage storage for a corporation. Some companies I have visitedkeep their new-hires off production systems for the first two years, working only on test or development systemsonly until then. Of course, labor is more expensive in some countries than others. Last year, I was doing a whiteboard session on-site for a client in China, and the last dry-erase pen ran out of ink. I asked for another pen, and they instead sent someone to go re-fill it. I asked wouldn't it be cheaper just to buy another pen, and they said "No, labor is cheap, but ink is expensive." Despite this, China does complain that there is a shortage of askilled IT labor force, so if you are looking for a job, start learning Mandarin.
Power and Cooling - Most data centers are located on raised floors, with large trunks of electrical power and hugeair conditioning systems to deal with all the heat generated from each machine. I have visited the data centers ofclients that are forced now to make decisions on storage based on power and cooling consumption, because the coststo upgrade their aging buildings are too high. Leading the charge is IBM, with technology advancements in chips, cards, and complete systems that use less power, and generate less heat. While energy is still fairly cheap in the grand scheme of things, fears ofGlobal Warmingand declining oil supplies, the costs ofpower and cooling have gotten some news lately. In 1956, Hubbert predicted US would reach peak oil supplies by1965-1970 (it happened in 1971), and this year Simmonsestimated that world-wide oil production began its decline already in 2005. Smart companies like Google have movedtheir server farms to places like Oregon in the Pacific Northwest for cheaper hydroelectric power.
Bandwidth - Last year IBM introduced 4Gbps Fibre Channel and FICON SAN networking gear, along with the servers and storage needed to complete the solution. 4Gbps equates to about 400 MB/sec in data throughput. By comparison, iSCSI is typically run on 1Gbps Ethernet, but has so much overheads that you only get abour 80 MB/sec. Next year, we may see both 8 Gbps SAN, and 10 GbE iSCSI, to provide 800 MB/sec throughputs. My experience is that the SAN is not the bottleneck, instead people run out of bandwidth at the server or storage end first. They may not have a million dollars to buy the fastest IBM System p5 servers, or may not have enough host adapters at the storage system end.
Floorspace - I end with floorspace because it reminds me that many "shortages" are temporary or artificially created. Floorspace is only in short supply because you don't want to knock down a wall, or build a new building, to handle your additional storage requirements.In 1997, Tihamer Toth-Fejel wrote an article for the National Space Society newsletter that estimated that ...Everybody on Earth could live comfortably in the USA on only 15% of our land area, with a population density between that of Chicago and San Francisco. Using agricultural yields attained widely now, the rest of the U.S. would be sufficient to grow enough food for everyone. The rest of the planet, 93.7% of it, would be completely empty.Of course, back in 1997 the world population was only 5.9 billion, and this year it is over 6.5 billion.
This last point brings me back to the concept of food, and I am not talking about doughnuts in the conference room, or pizza while making year-end storage upgrades. I'm talking aboutthe food you work so hard to provide for yourself and your family. The folks at Oxfam came up with a simpleanalogy. If 20 people sit down at your table, representing the world’s population:
3 would be served a gourmet, multi-course meal, while sitting at decorated table and a cushioned chair.
5 would eat rice and beans with a fork and sit on a simple cushion
12 would wait in line to receive a small portion of rice that they would eat with their hands while sitting on the floor.
So for those of you planning a special meal next Monday, be thankful you are one of the lucky three, and hopefulthat IBM will continue to lead the IT industry to help out the other seventeen.
It has always been the case in fast pace technology areas that you can't tell the players without a program card, andthis is especially true for storage.
When analyzing each acquistion move, you need to think of what is driving it. What are the motives?Having been in the storage business 20 years now, and seen my share of acquisitions, both from within IBM,as well as competition, I have come up with the following list of motives.
Although slavery was abolished in the US back in the 1800's, and centuries earlier everywhere else, many acquisitionsseem to be focused on acquiring the people themselves, rather than the products or client list. I have seen statistics such as "We retained 98% of the people!" In reality, these retentions usually involve costly incentives,sign-in bonuses, stock options, and the like. Desptie this, people leave after a few years, often because ofpersonality or "corporate culture" clash. For example, many former STK employees seem to be leaving after their company was acquired by Sun Microsystems.
If you can't beat them, join them. Acquisitions can often be used by one company to raise its ranking in marketshare, eliminating smaller competitors. And now that you have acquired their client list, perhaps you can sellthem more of your original set of products!
Symantec had acquired Veritas, which in turn had acquired a variety of other smaller players, and the end result is that they are now #1 backup software provider, even though none of theirproducts holds a candle to IBM's Tivoli Storage Manager. Meanwhile, EMC acquired Avamar to try to get more into the backup/recovery game, but most analysts still find EMC down in the #4 or #5 place in this category.
Next month,Brocade's acquisition of McData should take effect, furthering its marketshare in SAN switch equipment.
Prior to my current role as "brand market strategist" for System Storage, I was a "portfolio manager" where wetried to make sure that our storage product line investments were balanced. This was a tough job, as the investmentshad to balance the right development investments into different technologies, including patent portfolios.Despite IBM's huge research budget, I am not surprised that some clever inventions of new technologies comefrom smaller companies, that then get acquired once their results appear viable.
The last motive is value shift. This is where companies try to re-invent themselves, or find that they are stuck in acommodity market rut, and wish to expand into more profitable areas.
LSI Logic acquisition of StoreAge is a good exampleof this. Most of the major storage vendors have already shifted to software and services to provide customer value,as predicted in 1990's by Clayton Christensen in his book "The Innovator's Dilemma". The rest are still strugglingto develop the right strategy, but leaning in this general direction.
In last week's System Storage Portfolio Top Gun class in Dallas, some of the students were not familiarwith Really Simple Syndication (RSS). For the uninitiated, this can be intimidating.I thought a quick overview of what I've done might help:
Chose a "feed reader". I chose Bloglines but there are many others.
Use Technorati to search other blogs for keywords or phrases I am looking for.
When I find a blog that I like to continue tracking, I "add" it to my subscription list on bloglines. Just hit "add" and copy the URL of the blog you want to track. Bloglines will figure out the RSS keywords required.I track eight blogs at the momemnt, but some people with lots of time on their hands track 20 or more. It is easy to unsubscribe, so don't be afraid to try some out for a few days.
Since I was actually going to run a blog of my own, I read a few books on the topic. One I recommend is "Naked Conversations" by Robert Scoble and Shel Israel, both experienced bloggers.
Finally, I am not big on spell checking, but most places have the option to preview your post or comment before it actually gets posted, which is not a bad idea if you use any HTML tags.
For a quick taste of blogging, consider using Data Storage Blogger Feed Reader. This has a lot of blogs on the topic of storage, already added and categorized for your convenience, ready for your perusal.
I am sure there are many other ways to enjoy the Blogosphere, but this works for me.[Read More]
This week, I am in Las Vegas for [Edge 2016], IBM's Premiere IT Infrastructure conference of the year. Here is my recap of breakout sessions for Monday, Sep 19, 2016:
How do you storage a Zettabyte? IBM and Microsoft Know...
A [Zettabyte] is a million Petabytes, or a billion Terabytes, of data. Most clients I deal with have less than 10 PB of centralized storage in their data center, but there are a few that have much larger data repositories.
Ed Childers, IBM STSM and manager for Tape and LTFS development, and Aaron Ogus, Microsoft Architect, discussed different solutions developed by IBM and Microsoft. IBM's solution has been productized, and is available as IBM Spectrum Scale and IBM Spectrum Archive. Microsoft's solution is not productized, but is being "operationalized" to be used within Microsoft's Azure Cloud.
Not surprisingly, to be able to store a Zettabyte of data, you have to be creative and cost-effective with storage media. The current winner is magnetic tape, which continues to be 20 times less expensive than disk. IBM developed the Linear Tape File System (LTFS) and then shared it with other leading IT vendors. Ed also covered some future storage media developments, from using Macro-molecular strands of DNA, to Phase Change Memory (PCM).
All Flash is not Created Equal - Contrasting IBM FlashSystem with Solid State Drives (SSD)
Many IBM FlashSystem presentations focus on the product, but don't explain the underlying technology, specifically what differentiates IBM FlashSystem from substantially slower competitive alternatives like EMC XtremIO and PureStorage that are based instead on fallible commodity Solid State Drives (SSD).
By working closely with our chip vendor, Micron, IBM was able to improve the write endurance of these Multi-level cell (MLC) chips by 9.4x, and reduce write amplification by 45 percent.
I explained IBM's clever asymmetrical wear-level balancing, heat segregation, read disturb mitigation, voltage level shifting, and health binning, all of which contribute to the performance and reliability of this solution. IBM's innovative Error Correcting Code provides LDPC-like correction strength but at much faster BCH-like latency speed.
This was a popular session. Despite being moved to a much larger room, they still had to turn people away, so I will be repeating this session on Wednesday, 11:00am.
Real-time Compression: Bendingo and Adelaide Bank's Perspective
James Harris, Senior Storage Systems Specialist for [Bendingo and Adelaide Bank], presented his success story with the use of Real-time Compression. Oracle RAC databases got 60-70 percent savings. SQL databases got 70-80 percent savings. VMware VMFS datastores average 50 percent savings. For IBM i, he is getting 60-70 percent savings for SYSBAS, and over 70 percent savings of the rest of his IBM i production data.
As a result, the bank has not had to make any Capital Expenditures (CAPEX) for disk for 2-3 years since they started compressing in 2014.
Storage Options for Big Data and Analytics: IBM FlashSystem or Traditional Disk Systems?
Eric Sperley, IBM Software Defined Storage Architect, presented the basics of Hadoop and the Hadoop File System (HDFS), then explained how IBM Spectrum Scale, when combined with the right tiers of flash and disk technology, could be used to optimize an environment for big data analytics.
The Solutions EXPO is open all day, for people to visit the booths in between sessions. I stopped in for the evening reception. This is a great way to catch up on the latest products, re-connect with some clients or colleagues that I haven't seen in person for awhile, and meet new friends.
Shown here is Angie Welchert, who just started working for IBM a few years ago! I took her around to introduce her to some IBM executives at the Solutions EXPO.
As we get to larger and larger flash and spinning disk drives, a common question I get is whether to use RAID-5 versus RAID-6. Here is my take on the matter.
A quick review of basic probability statistics
Failure rates are based on probabilities. Take for example a traditional six-sided die, with numbers one through six represented as dots on each face. What are the chances that we can roll the die several times in a row, that we will have no sixes ever rolled? You might think that if there is a 1/6 (16.6 percent) chance to roll a six, then you would guarantee hit a six after six rolls. That is not the case.
# of Rolls
Probability of no sixes (percent)
So, even after 24 rolls, there is more than 1 percent chance of not rolling a six at all. The formula is (1-1/6) to the 24th power.
Let's say that rolling one to five is success, and rolling a six is a failure. Being successful requires that no sixes appear in a sequence of events. This is the concept I will use for the rest of this post. If you don't care for the math, jump down to the "Summary of Results" section below.
Error Correcting Codes (ECC) and Unreadable Read Errors (URE)
When I speak to my travel agent, I have to provide my six-character [Record Locator] code. Pronouncing individual letters can be error prone, so we use a "spelling alphabet".
The International Radiotelephony Spelling Alphabet, sometimes known as the [NATO phonetic alphabet], has 26 code words assigned to the 26 letters of the English alphabet in alphabetical order as follows: Alfa, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliett, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu.
Foxtrot Golf Mike Oscar Victor Whiskey
Foxtrot Gold Mine Oscar Vector Whisker
Boxcart Golf Miko Boxcart Victor Whiskey
Having five or so characters to represent a single character may seem excessive, but you can see that this can be helpful when communications link has static, or background noise is loud, as is often the case at the airport!
If spelling words are misheard, either (a) they are close enough like "Gold" for "Golf", or "Whisker" for "Whiskey", that the correct word is known, or (b) not close enough, such that "Boxcart" could refer to either "Foxtrot" or "Oscar" that we can at least detect that the failure occurred.
For data transfers, or data that is written, and later read back, the functional equivalent is an Error Correcting Code [ECC], used in transmission and storage of data. Some basic ECC can correct a single bit error, and detect double bit errors as failures. More sophisticated ECC can correct multiple bit errors up to a certain number of bits, and detect most anything worse.
When reading a block, sector or page of data from a storage device, if the ECC detects an error, but is unable to correct the bits involved, we call this an "Unrecoverable Read Error", or URE for short.
Bit Error Rate (BER)
Different storage devices have different block, sector or page sizes. Some use 512 bytes, 4096 bytes or 8192 bytes, for example. To normalize likelihood of errors, the industry has simplified this to a single bit error rate or BER, represented often as a power of 10.
Bit Error Rate per read (BER)
Consumer HDD (PC/Laptops)
Enterprise 15k/10k/7200 rpm
Solid-State and Flash
IBM TS1150 tape
In other words, the chance that a bit is unreadable on optical media is 1 in 10 trillion (1E13), on enterprise 15k drives is 1 in 10 quadrillion, and on LTO-7 tape is 1 in 10 quintillion.
There are eight bits per byte, so reading 1 GB of data is like rolling the die eight billion times. The chance of successfully reading 1GB on DVD, then would be (1 - 1/1E13) to the 8 billionth power, or 99.92 percent, or conversely a 0.08 percent chance of failure.
In this paper, Google had studied drive failure using an "Annual Failure Rate" or AFR. Here are two graphs from this paper:
This first graph shows AFR by age. Some drives fail in their first 3-6 months, often called "infant mortality". Then they are fairly reliable for a few years, down to 1.7 percent, then as they get older, they start to fail more often, up to 8.3 percent.
This second graph factors in how busy the drives are. Dividing the drive set into quartiles, "Low" represents the least busy drives (the bottom quartile), "Medium" represents the median two quartiles, and "High" represents the busiest drives, the top quartile. Not surprisingly, the busiest drives tend to fail more often than medium-busy drives.
Given an AFR, what are the chances a drive will fail in the next hour? There are 8,766 hours per year, so the success of a drive over the course of a year is like rolling the die 8,766 times. This allows us to calculate a "Drive Error Rate" or DER:
Drive Error Rate per hour (DER)
For example, an AFR=3 drive has a 1 in 287,800 chance of failing in a particular hour. The probability this drive will fail in the next 24 hours would be like rolling the die 24 times. The formula is (1-1/287,800) to the 24th power, resulting in a failure rate of roughly 0.008 percent.
Let's take a typical RAID-5 rank with 600GB drives at 15K rpm, in a 7+P RAID-5 configuration.
During normal processing, if a URE occurs on a individual drive, RAID comes to the rescue. The system can rebuild the data from parity, and correct the broken block of data.
When a drive fails, however, we don't have this rescue, so a URE that occurs during the rebuild process is catastrophic. How likely is this? Data is read from the other seven drives, and written to a spare empty drive. At 8 bits per byte, reading 4200 GB of data is rolling the die 33.6 trillion times. The formula is then (1-1/E16) to the 33.6 trillionth power, or approximately 0.372 percent chance of URE during the rebuild process.
The time to perform the rebuild depends heavily on the speed of the drive, and how busy the RAID rank is doing other work. Under heavy load, the rebuild might only run at 25 MB/sec, and under no workload perhaps 90 MB/sec. If we take a 60 MB/sec moderate rebuild rate, then it would take 10,000 seconds or nearly 3 hours. The chance that any of the seven drives fail during these three hours, at AFR=10 rolling the DER die (7 x 3) 21 times, results in a 0.025 percent chance of failure.
It is nearly 15 times more likely to get a URE failure than a second drive failure. A rebuild failure would happen with either of these, with a probability of 0.397 percent.
The situation gets worse with higher capacity Nearline drives. Let's do a RAID-5 rank with 6TB Nearline drives at 7200 rpm, in a 7+P configuration. The likelihood of URE reading 42 TB of data, is rolling the die 336 trillion times, or approximately 3.66 percent chance of URE failure. Yikes!
The time to rebuild is also going to take longer. A moderate rebuild rate might only be 30 MB/sec, so that rebuilding a 6TB drive would take 55 hours. The chance that one of the other seven drives fail, assuming again AFR=10, during these 55 hours results in a 0.462 percent.
This time, a URE failure is nearly eight times more likely than a double drive failure. The chance of a rebuild failure is 4.12 percent. Good thing you backed up to tape or object storage!
The math can be done easily using modern spreadsheet software. The URE failure rate is based on the quantity of data read from the remaining drives, so a 4+P with 600GB drives is the same as 8+P with 300GB drives. Both read 2.4 TB of data to recalculate from parity. The Double Drive failure rate is based on the number of drives being read times the number of hours during the rebuild. Slower, higher capacity drives take longer to rebuild. However, in both the 15K and 7200rpm examples, the chance of a URE failure was 8 to 15 times more likely than double drive failure.
Many of the problems associated with RAID-5 above can be mitigated with RAID-6.
After a single drive fails, any URE during rebuild can be corrected from parity. However, if a second drive fails during the rebuild process, then a URE on the remaining drives would be a problem.
Let's start with the 600GB 15k drives in a 6+P+Q RAID-6 configuration. The chance of a second drive failing is 0.0252 percent, as we calculated above. The likelihood of a URE is then based on the remaining six drives, 3600 GB of data. Doing the math, that is 0.0319 percent chance. So, the change of a URE during RAID-6 failure is the probability of both occurring, roughly 0.0000806 percent. Far more reliable than RAID-5!
Likewise, we can calculate the probability of a triple drive failure. After the second drive fails, the likelihood of a third drive at AFR=10, results in 0.00000546 percent.
Combining these, the chance of failure of rebuild is 0.000861 percent.
Switching to 6 TB Nearline drives, in a 6+P+Q RAID-6 configuration, we can do the math in the same manner. The likelihood of URE and two drives failing is 0.0145 percent, and for triple drive failure is 0.00183 percent. Chance of rebuild failure is 0.0163 percent.
Summary of Results
Putting all the results in a table, we have the following:
RAID-5 rebuild failure (percent)
RAID-6 rebuild failure (percent)
600GB 15K rpm
6 TB 7200rpm
Hopefully, I have shown you how to calculate these yourself, so that you can plug in your own drive sizes, rebuild rates, and other parameters to convince yourself of this.
In all cases, RAID-6 drastically reduced the probability of rebuild failure. With modern cache-based systems, the write-penalty associated with additional parity generally does not impact application performance. As clients transition from faster 15K drives to slower, higher capacity 10K and 7200 rpm drives, I highly recommend using RAID-6 instead of RAID-5 in all cases.
IBM has chosen three particular Software Defined Environments. At one end, IBM is a platinum sponsor of OpenStack which supports x86 servers, POWER systems and z System mainframes. A problem with open source projects like this, however, is that they can be a bit like putting together IKEA furniture from pieces in a box: "Some assembly required."
At the other end, highly proprietary environments from VMware and Microsoft bring enterprise-ready out-of-the-box solutions. However, nobody wants to be limited to just x86-based solutions. IBM offers the best of both worlds, basing its IBM Cloud and SmartCloud software on OpenStack standards, but providing enterprise-ready solutions for x86, POWER Systems and z System mainframes. This includes IBM Cloud Manager with OpenStack, IBM Cloud Orchestrator, and IBM SmartCloud Cost Management software products.
(Analogy: If open source solutions were vanilla ice cream, and proprietary solutions were chocolate ice cream, then IBM Cloud and SmartCloud is vanilla ice cream with chocolate sauce on top! This is the same approach IBM used for WebSphere Application Server, based on Apache web server, and IBM BigInsights, based on Hadoop analytics.)
For some people, software defined can also refer to how the resources are deployed. Rather than using specialized hardware, solutions based on industry-standard hardware can be delivered either as pre-built appliances, services in the Cloud, or as software-only products.
Back in the 1990s, IBM came up with the [Seascape Storage Enterprise Architecture], deciding to focus the design of its storage systems to be based, where possible and practical, on industry-standard components.
Let's review a few products:
IBM SAN Volume Controller (SVC) and Storwize V7000: IBM storage hypervisors were originally designed to run on industry-standard x86 servers. The IBM scientists at Almaden Research Center referred to this as the "COMmodity PArts Storage System" (COMPASS) architecture.
That is still mostly true 12 years later, but SVC and Storwize V7000 does have specialized hardware, including host bus adapter cards and the [Intel® QuickAssist] chip for Real-time Compression.
IBM DS8000 disk system: The DS8000 is based on off-the-shelf IBM POWER servers. Originally, you could only purchase POWER-based servers from IBM, but now thanks to the [OpenPOWER Foundation], you now have more options.
The DS8000 does use some specialized hardware for its host and device adapters, taking advantage of ASICs and FPGAs to optimize performance.
IBM XIV storage system: IBM acquired XIV back in 2008, but its design is very similar to Seascape architecture. All of the Intellectual Property was in the software, installed on industry-standard x86 servers, cache memory, host bus adapters and 7200 RPM nearline disk drives. I joked that the entire hardware bill-of-materials could be ordered directly from the CDW catalog!
IBM FlashSystem: IBM is #1 rank in the All-Flash Array market. Rather than using off-the-shelf commodity Solid-State drives (SSD), the IBM FlashSystem employs specialized hardware based on FPGAs to optimize performance.
IBM FlashSystem came from the recent acquisition of Texas Memory Systems, and was not designed under the IBM Seascape architecture.
Combining the method the resources are controlled and managed with the way storage is deployed results in a quadrant. Let's take a look at this from a storage perspective:
Traditional storage products that are based on specialized hardware that do not support Software Defined Environment APIs.
Storage products that are based on specialized hardware, but have been enhanced to support Software Defined Environment APIs. For OpenStack, this refers to Cinder and Swift interfaces. For VMware, this would include VAAI, VASA and VADP interfaces and vCenter Console plug-ins.
Storage products that are basically software, either installed on pre-built hardware appliances, offered as services in the Cloud, or software you deploy on your own industry-standard hardware. Unfortunately, this category does not support software defined environment APIs, and so proprietary interfaces require administrator-intensive involvement instead.
Storage software for industry-standard hardware. You purchase the appropriate server, cache memory, flash and disk drives as needed. This category could also extend to pre-built appliance versions of this software, or as services in the Cloud. APIs for software defined environments are available to deploy this with self-service automation.
IBM Spectrum Storage is a family of Category IV software offerings. Here are the products announced:
Based on technology from...
IBM Spectrum Control™
Simplified control and optimization of storage and data infrastructure
SmartCloud Virtual Storage Center, Tivoli Storage Productivity Center
IBM Spectrum Protect™
Single point of administration for data backup and recovery
Tivoli Storage Manager
IBM Spectrum Accelerate™
Accelerating speed of deployment and access to data for new workloads
XIV storage system
IBM Spectrum Virtualize™
Storage virtualization that frees client data from IT boundaries
SAN Volume Controller
IBM Spectrum Scale™
High-performance, scalable storage manages exabytes of unstructured data
GPFS and codename:Elastic Storage
IBM Spectrum Archive™
Enables easy access to long term storage of low activity data
Linear Tape File System (LTFS)
Last year, IDC recognized IBM as #1 in this new emerging software defined storage market. This announcement reinforces IBM's lead in this area. See the [Press Release] for details.
We have a lot to cover, so I will do the quick recap today, and then go in-depth on subsequent posts.
IBM FlashSystem 840 and V840
The FlashSystem now offers a high-voltage 1300W power supply. There are two supplies providing redundancy. In the unlikely event that you are doing maintenance on one of them, the other supply handles all the workload. With the original power supply, the system slowed down the clock speeds to reduce electrical demand. The new power supplies can handle full performance.
Also, the Graphical User Interface (GUI) now holds 300 days of performance data with pan-and-zoom capability. Five predefined graphs showing key performance metrics with additional user-defined metrics available for visualization.
The new v7.4 level of microcode combines features from v7.2.7 and v7.3 into a single code base.
In previous 3-site mirroring implementations, you had A-to-B-to-C cascading. Metro Mirror would get the data from A-to-B, then Global Mirror would copy B-to-C. Multiple Target Peer-to-Peer Remote Copy (PPRC) feature number 7025 allows you to have two separate paths of the data: A-to-B and separately A-to-C. Some folks refer to this as a "star" configuration.
For System z mainframe clients, the new v7.4 introduces new zHyperWrite for DB2 database logs, enhances zGM (XRC) write pacing, and extends Easy Tier automated-tiering API to allow z/OS applications to influence placement on different tiers of storage.
The High Performance Flash Enclosures (HPFE) that IBM introduced last May for the "A" frames are now available for "B" frames. You can have four HPFE in A, and another 4 in B.
DS8870 now offers 600 GB 15K rpm SAS and 1.6 TB 2.5-inch SSD encryption drives for additional capacity and cost performance options to meet data growth demands within the same space. Both support data-at-rest encryption.
Lastly, we have upgraded the OpenStack Cinder driver to the latest Juno release, including features like volume replication and volume retype.
The latest SAN switch is a slim 1U high box that can be configured with 12 or 24 ports. These are 16Bps ports that can auto-negotiate down to 8Gbps, 4Gbps and 2Gbps. These are easy to set up, and can be managed with the IBM Network Advisor management software.
GPFS is the core technology for IBM's "Codename: Elastic Storage" initiative.
You have several options. First, you can purchase just the GPFS software itself. It runs natively on AIX, Windows and Linux, and can be extended to support other operating systems through the use of NAS protocols like NFS or CIFS. Today, the Linux support which was previously just x86 and POWER has been extended to include Linux on System z mainframes as well.
GPFS v4.1 offers "Native RAID" support, with de-clustered RAID in 8+2P and 8+3P configurations. Like the IBM XIV Storage System, this scatters the data across many drives, and can tolerate drive failures better than traditional RAID-5 configurations.
Another option is to get a pre-configured "Converged" appliance that combines servers, storage and hardware. We already offer SONAS and the Storwize V7000 Unified, but IBM now offers the "GPFS Storage Server" running on the new P822L Linux-on-Power servers, RHEL v7, and and GPFS v4.1 with Native RAID to twin-tailed attached DCS3700 expansion drawers. Since GPFS provides the RAID, no need for DCS37000 controllers, saving clients substantial costs.
The IBM Storwize family includes SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Storwize V5000, Storwize V3700 and Storwize V3500.
The big announcement is that IBM now offers data-at-rest encryption for block data on internal drives in the new generation of Storwize V7000 and V7000 Unified models. There is no performance impact, and no need to purchase new SED-capable drives.
Data-at-rest encryption helps in several ways. First, it protects data if a drive is pulled out and taken away maliciously. Second, it protects data if the drive fails and you want to send it back to the manufacturer for replacement. Third, it allows you to perform a "secure erase" so that the data can be sold or re-purposed without fear of anyone reading previous data.
Initially, the encryption key management is built-in, with the keys stored on a USB memory stick physically attached to the model. In the future, IBM will extend this support to SVC, extend this support to external virtualized drives, and extend this support to IBM Security Key Lifecycle Manager (SKLM).
Other announcements include 16Gbps adapters for SVC, Storwize V7000 and V7000 Unified. The entire Storwize family will also enjoy both 1.8TB 10K RPM 2.5-inch drives, and 6TB 7200RPM 3.5-inch drives
See the Announcement Letter (available later this month) for details.
New TS1150 enterprise tape drives
The anticipation is over! The new TS1150 tape drive has been announced, with 10TB raw un-compressed "JD" media cartridge capacity and 360 MB/sec throughput performance. The new drive is read/write compatible with TS1140 on JC, JY and JK media cartridges.
For the virtual tape libraries for the System z platform, IBM offers two models. The TS7740 had a small amount of disk front ending tape library of physical tape. The TS7720 had a large amount of disk with no tape library.
But then the person carrying the chocolate bar bumped into the person carrying the jar of peanut butter, and the rest is history. IBM will now allow tape attach on TS7720, best of both worlds! Large disk cache plus tape library attach.
Tape-attached TS7720 configurations can have up to eight partitions, with different partitions have different policies. Some might move data from disk cache to tape more aggressively, while other partitions may keep data on disk for longer periods, or indefinitely if needed.
Logical tape volumes can now be up to 25GB in size.
The DCS3700 is IBM's entry-level disk system for sequential-oriented workloads. Today, IBM announced new disk drive options: 400GB 2.5-inch SSD, 800 GB 2.5-inch SSD, and 1.2TB 10K RPM 2.5-inch drive. All of these offer T10 Protection Information (PI) data integrity.