Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
And, it's not too late to sign up for IBM Tivoli's [Pulse 2008] conference that will be heldin Orlando, Florida, May 18-22, 2008. I'll be there Sunday and Monday only, in the Tivoli Storage track, so if you are planning to attend and wish to meet up with me while I am there, please send me a note!
[Earth Day] is celebrated in many countries on April 22, which marks the anniversary of the birth of the modern environmental movement in 1970. Others celebrate this on the March equinox.
IBM has finally aggregated everything that we are doing around "Green" initiatives onto a single[IBM Green] landing page. This has everything from IBM's own activities as well as what we sell to our clients.
Also, to mark this occasion, IBM held an internal contest for employees to make videos about Earth Day,the environment, and IT's role in making the situation better. The grand prize winner, and 10 secondprize winners, are available on this [IBM Green Contest - YouTube channel].Of these, I liked "New Life for Old Silicon" (shown here on the left).
IBM also developed [Power Up, the Game], which is theEarth Day Network's "official" game for today's festivities. It's a 3-D game created by IBM Research to help save a fictitious planet - the goal being to help students learn about ecology and climate change. This game is also hoped to motivate young students to get interested in math, scienceand technology.Eightbar has a great post [PowerUp - A serious game out inthe wild] discussing this.Here's also a 3-minute[the making of "Power Up, the game" video] to geta behind-the-scenes look.The game is a downloadable Windows client that then connects to the main servers to run.
You could buy 10 liters of gasoline in Venezuela with this coin.
I'm back from South America, and am now in Chicago, Illinois. I'm having breakfast at the Starbucksdowntown, and thought I would make a post before all of my meetings today.
On this trip, I met with IBM Business Partners and sales reps from Argentina, Colombia, Ecuador and Venezuela. While I have visited thefirst three countries on past trips, this was my first time to Caracas, Venezuela. I grew up in La Paz, Bolivia, and speak Spanish fluently, so had no problemgetting around and holding discussions with everyone. While my friends in the US are oftensurprised I speak multiple languages, it doesn't surprise anyone I visit in other countries.If you are going to have worldwide job responsibilities for a global company that does businessin over 180 countries, the least you could do is learn a few additional languages. I suspect themajority of the 350,000 IBM employees speak at least two languages, the exceptions being mostly the 50,000 orso employees that live in the United States.
I flew on American Airlines from Tucson to Dallas to Caracas, and was only slightly delayed as a resultof all of the flight cancellations that happened earlier that week. Some companies designate a single "official airline" for their employees to use. That makessense if all of your employees are located in a single city, and that city is the hub for yourdesignated airline.IBM is too big, too spread out, and sells technology to nearly every airline to make sucha designation. Instead, IBM tries to spread its business out to multiple carriers, although all ofmy colleagues seems to have their own personal favorites. Mine are American Airlines, Singapore Airlines and Cathay Pacific.
While other people were upset over the delays, I found American Airlines did a great job keeping me informed,and all their employees I talked to seemed to be handling the situation fairly well. If youfly on American, I recommend you sign up for "text message" notifications. I did this for everyleg of my trip, and was kept up to date on times, gates and status. Very helpful!American Airlines even started their own corporate blog: [AA Conversation] (Special thanks to my friend[Paul Gillen] for pointing this out)
(I read somewhere that if you are going to travel anywhere, you need to remember to bringboth your sunscreen and your sense of humor, otherwise you are going to get burned. Goodadvice! Trust me, you don't even know how bad it can really be until you travel in the third world.)
Anyhoo, last week, IBM Venezuela celebrated its 70th anniversary. That's right, IBM has been doingbusiness in Venezuela for the past 70 years. Also last week, IBM put out its impressive [1Q08 quarterly results],including 10 percent growth for IBM System Storage product line worldwide, comparing what IBM earned this first quarter to what IBM earned the first quarter of last year. For just the Latin American countries,the growth for IBM System Storage was 20 percent!There are a lot of oil and gas companies in Venezuela. With a barrel of oil selling at more than$117 US dollars, these companies are looking to spend their newly earned profits on IBM systems, software and services.
As for the picture above, that is a one-thousand Bolivares coin, worth about 47 US cents atthis week's official exchange rate. As with many Latin American countries going through [years of high inflation], Venezuela was tired of all those zeros on their money. For example, a cheeseburger, freedom fries and a Cokeat McDonald's would set you back 20,000 Bolivares.This year the Venezuelan governmentcreated a new currency called "Bolivares Fuertes" (VEF), lopping off the last three zeros.So, the coin above would be replaced by a new coin with a big "1" on it instead, and an old 2000 Bolivares billwould be replaced by a new 2 Bolivares Fuertes bill. Unfortunately,I had to give all my new Venezuelan money back at the airport upon leaving, but they let me keep the coinabove, since it is old money, as a souvenir so that I could use it as a ball mark for playing golf.
(The term Bolivares is named after Simon Bolivar who was born in Caracas. He is famous throughoutSouth America, and was, and I am not making this up, the first president of Colombia, the secondpresident of Venezuela, the first president of Bolivia, and the sixth president of Peru. Here isthe [Wikipedia article] to learn more.)
Gasoline costs a mere 100 old Bolivares per liter.For those who don't do metric, gasoline therefore costsless than 18 cents per gallon. By comparison, in the USA, the average today was $3.47 US dollarsper gallon, of which 18.4 cents of this is Federal tax. That's right, we pay more just in taxes forgasoline than los venezolanos pay for it all.
The side effect of cheap gas is bad traffic. Everybody in Venezuela drives their own car, and nobody thinksabout the price of gasoline, carpooling, or taking public transportation, acting much like Americans used to, up until a few years ago. With some of the gridlock we faced, it might have been faster (but not safer)to walk there instead.
Which makes me wonder if American Airlines fills up their airplanes with fuel at these lower prices when theypick up people in Caracas to take them back to the United States. In 2002, fuel represented 10 percentof the average airline's operating expenses, but today it is now 25 percent. That is a drastic increase!
The same is happening in data centers. In the past, electricity was so cheap, and such a small percentof the total IT budget, nobody gave it much thought. But as the usage of electricity increased, andthe cost per KWh went up, this has a multiplying effect, and the growth in power and cooling costs isgrowing four times faster than the average IT hardware budget increase.
Normally, IBM only makes announcements on Tuesdays, but today, Friday, IBM announces that it acquired Diligent Technologies. What? I got a lot ofquestions about this, so I thought I would start with this...
When I posted in January that[IBM Acquires XIV],fellow EMC blogger Mark Twomey of StorageZilla fame, sent me a comment:
"Ah now Tony I wasn't poking fun. Indeed I find it fascinating that Moshe who's been sitting out on the fringes for years having been banished for being an obstructionist to EMC entering the mid-market is now back.
Which reminds me what happens with Diligent? There his as well aren't they or has he packed his stake in that in?"
As you might have guessed, I am privy to a lot of stuff going on behind the scenes at IBM that I can't talk about in this blog, and all these rumors in the blogosphere about IBM acquisition of Diligent was a topic I couldn't officially recognize, defend or deny, until official IBM announcements were made.
In his latest post, Mark wonders about[the last Tape and Mainframe sales person on earth]. He recounts my interaction with fellow HDS blogger Hu Yoshia about the energy benefits ofVirtual Tape Libraries. Knowing that we were going to announcement IBM's acquisition of Diligent soon, I thoughtthis would be a worthy exchange, driving up the sales of Diligent boxes (whether you buy them from IBM or HDS).Diligent already had reselling arrangements with HDS, and IBM plans to continue thosearrangements going forward with HDS. As I have explained before in my post [Supermarketsand Specialty Shops], IBM and HDS cater to different customers, so if a customer who wants the best technologyfrom a specialty shop, they can buy IBM Diligent products from HDS, but if they want one-stop shopping, they can buyIBM Diligent directly from IBM or its other IBM Business Partners.
(Perhaps a more tricky situation is that Diligent also had an arrangement with Sun Microsystems, which competesdirectly against IBM as another IT supermarket vendor, but I have not heard how IBM has decided to handle thisgoing forward.)
For more on this intricate mess of interconnected companies, alliances and partnerships, read Dave Raffo's article[Data dedupe dance cardfilling up] over at Storage Soup.
So, let's tackle the first question:
Q1. What will happen to IBM's real tape library business?
Come on! IBM is Number one in tape, we've had virtual tape libraries since 1997 (the first in the industry)and continue to do well in both virtual and real tape libraries. Both provide value to the customer, and bothhave their place as part of the overall "information infrastructure". This acquisition provides yet another choicefor clients on our "supermarket" shelf.
(For those following the ["which is greener"] discussion, the robot of the IBM TS3500 real tape library consumes185W per frame (when moving) and each tape drive consumes 50W (when actively working on a tape). Compared to 13W per SATA disk drive, each 6-drive frame of a TS3500 consumes as much electricity as 37 SATA disk drives. If you are not running backups 24x7, the total KWh per day for your tape library is actually quite less, but as several people have pointed out, there are customers that do run backups 80-90 percent of the time. LTO-4 tapes can hold 800GB uncompressed, and SATA disk are now available in 1TB (1000 GB) size, so you can have fun with your own comparisons.)
Meanwhile, Scott Waterhouse, one of the few people at EMC who understand tape workloadslike backup and archive, takes me to task in his Backup Blog with his post[I want a Red Ferrari].For those who are surprised that anyone at EMC might understand backup workloads, EMC did acquire a company calledLegato, and perhaps Scott came from that acquisition. I've never met Scott in person, but based solely only fromhis writings, he seems to know his stuff and makes strong arguments for using IBM Tivoli Storage Manager (TSM) with deduplication and virtual tape libraries.
While TSM does a good job of "deduplicating" at the client first, backing up only changed data, Scott feels database and email repositories must be backed up entirely each time, which is what happens in many other backup software products. Some clients might have 80 percent database/email and only 20 percent files, while others might have less than 20 percent database/email and 80 percent files, so this might influence whether deduplication will have small or big benefit.If TSM has to backup the entire database, even though little has changed since the last backup, that is where deduplication on a virtual tape library can come in handy. For IBM DB2 and Oracle databases, IBM TSM application-aware Tivoli Data Protection module interface backs up only changed data, not the entire file. Thanks to IBM's FilesX acquisition-- (also coincidently from Israel) --IBM can extend this support now to SQL Server databases as well.However, to be fair, Scott is partly correct, TSM does backup some database and email repositories in their entirety, which is why it is a good idea to have BOTH an IBM virtual tape library with deduplication and Tivoli Storage Manager to handle all cases. This brings us to the next question:
Q2. What will happen to IBM's patented "progressive backup" technology?
IBM will continue to use TSM's progressive backup technology. TSM already works great with Diligent virtual tapelibraries. One example is LAN-free backup. In this configuration, the TSM client writes its backups directly toa virtual or real tape library, over the SAN, and then sends the list of files backed up to the TSM server over theLAN to record in its database. This can greatly reduce IP traffic on your LAN during peak backup periods. For more about this, see the IBM Redbook titled["Get More Out of Your SAN with IBM Tivoli Storage Manager"].
Jon Toigo from DrunkenData asks[Did IBM Do Due Diligence Before Making Diligent Acquisition a Done Deal?] which is probably always a valid question. Unlike XIV, I wasn't part of the Diligent acquisition team, so I can't provide first hand account of the process. I am told that the IBM team did all the right things to make sure everything is going to turn out right.Sadly, many companies that make acquisitions in the IT industry fail to make them work. Fortunately, IBM is one of the few companies that has a great success record, with over 60 acquisitions in the past six years.In the Xconomy forum, Wade Rousch writes[IBM and the Art of Acquisitions]and gives some insight why IBM is different. Jon did not understand why Cindy Grossman, IBM VP of tape and archive solutions, ran the analyst conference call for this announcement, which brings me to the next question:
Q3. What is Diligent virtual tape library going to be categorized as, a disk system or a tape system?
IBM organizes its storage systems based on the host application workloads.Products to address disk workloads (SVC, DS8000 series, DS6000 series, DS4000 series, DS3000 series, N series, XIV Nextra) are in our disk systems group. Storage that appears to host applications like a tape system to address workloads like backup and archive (tape drives, libraries and tape virtualization) are in our tape and archive group. IBM Diligent has two products, one for big workloads and one for medium workloads. Both look liketape systems, so our tape and archive team, who understand tape workloads like backup and archive the best, are obviously the best choice to support IBM Diligent in the mix.
IBM will offer both N series and Diligent deduplication capabilities. For disk workloads, IBM N series offers a post-process deduplication feature at no additional charge. For tape workloads, IBM will now offer an in-line deduplication feature with Diligent Technologies. Different workloads, different offerings.
As with any acquisition, there will be some changes. The 100 folks from Diligent will get to learn the IBM wayof doing things. This brings me to our fifth and final question:
Q5. What is the correct spelling: deduplication or de-duplication?
It appears that Diligent has a corporate-wide standard to hyphenate this term (de-duplication), but the "word police" at IBM that control and standardize all "proper spellings, trademarks, and capitalization" have sent me corporate instructions a few days ago that IBM does not to hyphenate this term (deduplication). So, going forward, it will be "deduplication", or "dedupe" for short.I suspect one of the first tasks that our new IBMers from Diligent will be doing is removing all those hyphens fromthe [Diligent Technologies website]!
That's all for now, I'm off to Chicago, Illinois tomorrow!
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization and vendor lock-in.
HDS is a major vendor for disk storage virtualization, and Hu Yoshida has been around for a while, so I felt it was fair to disagree with some of the generalizations he made to set the record straight. He's been more careful ever since.
However, his latest post [The Greening of IT: Oxymoron or Journey to a New Reality] mentions an expert panel at SNW that includedMark O’Gara Vice President of Infrastructure Management at Highmark. I was not at the SNW conference last week in Orlando, so I will just give the excerpt from Hu's account of what happened:
"Later I had the opportunity to have lunch with Mark O’Gara. Mark is a West Point graduate so he takes a very disciplined approach to addressing the greening of IT. He emphasized the need for measurements and setting targets. When he started out he did an analysis of power consumption based on vendor specifications and came up with a number of 513 KW for his data center infrastructure....
The physical measurements showed that the biggest consumers of power were in order: Business Intelligence Servers, SAN Storage, Robotic tape Library, and Virtual tape servers....
Another surprise may be that tape libraries are such large consumers of power. Since tape is not spinning most of the time they should consume much less power than spinning disk - right? Apparently not if they are sitting in a robotic tape library with a lot of mechanical moving parts and tape drives that have to accelerate and decelerate at tremendous speeds. A Virtual Tape Library with de-duplication factor of 25:1 and large capacity disks may draw significantly less power than a robotic tape library for a given amount of capacity.
Obviously, I know better than to sip coffee whenever reading Hu's blog. I am down here in South America this week, the coffee is very hot and very delicious, so I am glad I didn't waste any on my laptop screen this time, especially reading that last sentence!
In that report, a 5-year comparison found that a repository based on SATA disk was 23 times more expensive overall, and consumed 290 times more energy, than a tape library based on LTO-4 tape technology. The analysts even considered a disk-based Virtual Tape Library (VTL). Focusing just on backups, at a 20:1 deduplication ratio, the VTL solution was still 5 times per expensive than the tape library. If you use the 25:1 ratio that Hu Yoshida mentions in his post above, that would still be 4 times more than a tape library.
I am not disputing Mark O'Gara's disciplined approach. It is possible that Highmark is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library.In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
(Update: My apologies to Mark and his colleagues at Highmark. The above paragraph implied that Mark was using badproducts or configured them incorrectly, and was inappropriate. Mark, my full apology [here])
If you do decide to go with a Virtual Tape Library, for reasons other than energy consumption, doesn't it make sense to buy it from a vendor that understands tape systems, rather than buying it from one that focuses on disk systems? Tape system vendors like IBM, HP or Sun understand tape workloads as well as related backup and archive software, and can provide better guidance and recommendations based on years of experience. Asking advice abouttape systems, including Virtual Tape Libraries, from a disk vendor is like asking for advice on different types of bread from your butcher, or advice about various cuts of meat at the bakery.
The butchers and bakers might give you answers, but it may not be the best advice.
Fellow blogger and cartoon writer Scott Adams writes in his Dilbert Blog posts about the [Monte Hall problem].Monte Hall was the host of the American game show Let's Make a Deal. Here is an excerpt:
"The set up is this. Game show host Monte Hall offers you three doors. One has a car behind it, which will be your prize if you guess that door. The other two doors have goats. In other words, you have a 1/3 chance of getting the car.
You pick a door, but before it is opened to reveal what is behind it, Monte opens one of the doors you did NOT choose, which he knows has a goat behind it. And he asks if you want to stick with your first choice or move to the other closed door. One of those two doors has a car behind it. Monte knows which one but you don’t."
Mathematically, on your initial choice of doors, you have a 1/3 chance of picking the car, and 2/3 chance ofpicking a goat. But, after you make a choice, Monte knows which door(s) have goats behind them, and selectsone that exposes the goat. If you stay with your initial choice, you still have a 1/3 chance that you win acar, but if you change your mind and choose the other door, your odds double, you have a 2/3 chance of winning.This is not obvious at all to most people, so Scott points people to the [Wikipedia entry] that provides the mathematicaldetails.
What does this have to do with storage?
When you pick a disk system, you are hoping you pick the door with the car. You want a disk system that meets your performance requirements for your particular workload and easy to deploy, configure and manage, with a low total cost of ownership for the three, four or five years you plan to use it.However, with over forty different storage vendors, there are some doors that might have goats. Some vendorshave only 90 day warranties for their software, and I don't know any customers that replace their disk systems that often.
(It was pointed out to me that it was unfair in my last week's post about[Xiotech'slow cost RAID brick], that I singled out EMC offering minuscule 90day warranties for the software needed to run their disk systems. I apologize. I have sincelearned that HDS and HP also shaft their clients with 90 day warranties. Apparently thereare a lot of vendors out there who lack confidence in the quality of their software!)
It would be nice if everyone published all of their performance benchmarks so that you canchoose the right door with the car behind it, but sadly in the storage industry, not everyoneparticipates with industry-standard benchmarks like the[Storage Performance Council].
In other cases, people make their choices based on past decisions. Perhaps someone beforethem chose one vendor over another, and it seems simple enough just to stay with the originalchoice. It is amazing how often people stay with their company's original choice, what we call in the industry the "incumbent vendor", without exploring alternatives.
So, if you bought an EMC, HDS or HP disk system in the last 90 days, it's not too late for you.Tell your local IBM rep that you are afraid you picked the door with the goat, and that you want to change your mind, and choose the other door and go with IBM instead.
You will double your chances of being happier with your new choice!
Storage Networking World conference is over, and the buzz from the analysts appears to be focused onXiotech's low-cost RAID brick (LCRB) called Intelligent Storage Element, or ISE.
(Full disclosure: I work for IBM, not Xiotech, in case there weren't enough IBM references on this blog page to remindyou of that. I am writing this piece entirely from publicly available sources of information, and notfrom any internal working relationships between IBM and Xiotech. Xiotech is a member of the IBM BladeCenteralliance and our two companies collaborate together in that regard.)
Fellow blogger Jon Toigo in his DrunkenData blog posted [I’m Humming “ISE ISE Baby” this Week] and then a follow-up post[ISE Launches]. I looked up Xiotech's SPC-1benchmark numbers for the Emprise 5000 with both 73GB and 146GB drives, and at 8,202 IOPS per TB, does not seem to be as fast as IBM SAN VolumeControllers 11,354 IOPS per TB. Xiotech offers an impressive 5 year warranty (by comparison, IBM offers up to 4 years, and EMC I think is stillonly 90 days).Jon also wrote a review in [Enterprise Systems]that goes into more detail about the ISE.
Fellow blogger Robin Harris in his StorageMojo blog posted [SNW update - Xiotech’s ISE and the dilithium solution], feeling that Xiotech should win the "Best Announcement at SNW" prize. He points to the cool video on the[Xiotech website]. In that video, they claim 91,000 IOPS.Given that it took forty(40) 73GB drives (or 4 datapacs) in the previous example to get 8,202 IOPS for 1TB usable, I am guessing the 91,000 IOPS is probably 44 datapacs (440 drives) glommed together, representing 11TB usable.The ISE design appears very similar to the "data modules" used in IBM's XIV Nextra system.
Fellow blogger Mark Twomey from EMC in his StorageZilla blog posted[Xiotech: Industry second]correctly points out that Xiotech's 520-byte block (512 bytes plus extra for added integrity) was not the firstin the industry. Mark explains that EMC CLARiiON had this since the early 1990's, and implies in the title that this must have been the first in the industry, making Xiotech an industry second. Sorry Mark, both EMC and Xiotech were late to the game. IBM had been using 520-byte blocksize on its disk since 1980 with the System/38. This system morphed to the AS/400, and the blocksize was bumped up to 522 bytes in 1990, and is now called the System i, where the blocksize was bumped up yet again to 528 bytes in 2007.
While IBM was clever to do this, it actually means fewer choices for our System i clients, being only able to chooseexternal disk systems that explicitly support these non-standard blocksize values, such as the IBM System Storage DS8000and DS6000 series. (Yes, BarryB, IBM still sells the DS6000!) The DS6000 was specifically designed with the System i and smaller System z mainframes in mind, and in that niche does very well. Fortunately, as I mentioned in my February post [Getting off the island - the new i5/OS V6R1], IBM has now used virtualization, in the form of the VIOS logical partition, to allow i5/OS systems to attach to standard 512-byte block devices, greatly expanding the storage choices for our clients.
(Side note: SNW happens twice per year, so the challenge is having something new and fresh to talk about each time. While Andy Monshaw, General Manager of IBM System Storage, highlighted some of the many emerging technologies in his keynote address, IBM shipped on many of them prior to his last appearance in October 2007: thin provisioning in the IBM System Storage N series, deduplication in the IBM System Storage N series Advanced Single Instance Storage (A-SIS) feature, and Solid State Disk (SSD) drives in the IBM BladeCenter HS21-XM models. Of course, not everyone buys IBM gear the first day it is available, and IBM is not the only vendor to offer these technologies. My point is that for many people, these are still not yet deployed in their own data center, and so they are still in the future for them. However, since these IBM deliveries happened more than six months ago, they're old news in the eyes of the SNW attendees. While those who follow IBM closely would know that, others like[Britney Spears] may not.)
Back in the 1990s, when IBM was developing the IBM SAN Volume Controller (SVC), we generically called the managed disk arrays that were being virtualized by the SVC as "low-cost RAID brick" or LCRB. The IBM DS3400 is a good example of this. However, as we learned, SVC is not just for LCRB, it adds value in front of all kinds of disk systems, including the not-so-low-cost EMC DMX and IBM DS8000 disk systems. ISE might make a reasonable back-end managed disk device for IBM SVC to virtualize. This gives you the new cool features of Xiotech's ISE, with IBM SVC's faster performance, more robust functionality and advanced copy services.
Next week, I'll be in South America in meetings with IBM Business Partners and storage sales reps.
My colleague, Marissa Benekos, is on location with her video camera in Orlando, Florida for theComputerWorld [Storage Networking World] conference.
The IT specialists from the IBM booth were excited at David Bricker's debut on YouTube.Here's the rest of the gang in this [video].
Here's Andy Monshaw, General Manager of IBM System Storage and keynote speaker at this SNW event, summarizingIBM's "Information Infrastructure" strategy in 60 seconds in this [Youtube video].
This last video is Clod Barrera talking about the importance of security. Clod is an IBM Distinguished Engineerand Chief Technical Strategist for IBM System Storage product line. Here is his[Youtube video]
It looks like Marissa is having a lot of fun taking these videos at the event.More videos, as we get them, will be posted to the [IBM videos channel].
My IBM colleague Marissa Benekos brought her hand-held video camera to [Storage Networking World] conference in Orlando, Florida.I am not there, as I had a conflict with another conference going on here in Tucson, so am relyingon Marissa to feed me information to blog about.
In this segment, she interviews "booth babe" David Bricker. I've known David a long time,and if you are there at the conference, tell him I sent you to visit him at the IBM booth.
David Bricker shows off some of the IBM System Storage product line at SNWin this YouTube video (2 minutes)
Sadly, I can't be in two places at once. SNW is a great conference to attend!
Well, its Tuesday, and that means more IBM announcements!!!
Let's do a quick recap of what was announced for storage:
We now support 1000GB SATA-II drives in the DS4000 series. This is available for the DS4200 model 7V, DS4700, DS4800 as well as the expansion drawers EXP420 and EX810. When I asked our marketing team why we weren't going to say "1TB" like everyone else, they thought 1000GB sounds bigger. I guess I should not have asked that on April Fool's day. For more details, see the IBM press releases for the [DS4200/EXP420and DS4700/DS4800/EXP810].
IBM announced new machine code Release 1.4a for the The IBM Virtualization Engine™ TS7700 virtual tape library for our System z mainframe customers.Various features come with this new level of machine code. See the IBM [Press Release] for more details.
Load balancing across the grid
Host control over the copy of logical volumes on a cluster by cluster basis
Option to gracefully remove an individual cluster from an existing grid
Initial-state reset for TS7700 database for cluster cleanup
Option to upgrade single-cache to dual-cache configuration
Also announced were updates to the 7214 model 1U2. Technically this is not in the IBM System Storage product line,but instead is designed specifically for our System p server line. This is a "media drawer" that allows you to havetape on one side, and optical on the other, in a single enclosure. IBM announced that you can now have DAT160 80GBdrives that is read-write compatible with DAT72 and DDS4 drives, and half-high LTO-4 drives that can read LTO-2 media, and is read-write compatible with LTO-3 media.Read the IBM [Press Release] for details.
Finally, if you are in the United States, Canada or the Carribean, there is a special discount promotionfor tape libraries purchased before June 20, 2008. This includes IBM TS3100, TS3200, TS3310 and TS3500 libraries.See the [Promotion Details] for eligibility.
IBM has added capability to the IBM TotalStorage Productivity Center for Replication. A quick review of the differentoptions for this component.
base Replication (uni-directional from primary to disaster site)
Two-site replication (bi-directional, including failover and failback)
Three-site replication (site awareness for all the copy sessions between all three sites in all situations)
Productivity Center for Replication supported all these levels for DS8000, DS6000 and ESS 800 disk models, butfor SVC it only supported FlashCopy and Metro Mirror for the uni-directional base. IBM announced version 3.4 today that has added support for SVC for Global Mirror (asynchronous disk mirroring) and bi-directional failover/failback. This supports lets you have "practice volumes" that allow IT managers to perform "disaster recovery exercises" without disrupting production workloads.
Also, for the DS8000, there is support for the new Space Efficient FlashCopy and DynamicVolume Expansion features. Here is the IBM
The Productivity Center for Replication server can run on either a Windows/Linux-x86 server or a z/OS mainframe server.The Productivity Center for Replication on System z offers all the same new support for SVC and DS8000, as well asincorporated Basic HyperSwap capability that I mentioned in my post last February[DS8000 Enhancements for the IBM System z10 EC].
Here are the IBM press releases for the TotalStorage Productivity Center for Replication on[Windows/Linux-x86and System z] servers.
I'm at a Business Partner conference today, discussing these announcements and other topics, so need to go back to those festivities.
On StorageZilla, fellow blogger Mark Twomey introduces the latest entrant from EMC to the blogosphere,in his post [Polly Pearson's blog].
Although we share the same name, with the same exact spelling, I would like be the first to point out we are not related, at least as far as I know. Basing solely from her post[Welcome to my Blog - Part 1], sheis a year younger than I am, a lot better looking, majored in communications, and is not afraid to quit acrappy job for a much better job elsewhere. I on the other hand, majored in engineering, but agree wholeheartedly not to stick in a crappy situation. There is such a skills shortage out there in the IT industry,with a cap on U.S. [H-1B visas] at a paltry [65,000 this year]. If you don't like your IT job, you should be able toquit and find another one in the IT industry you are more passionate about.
On a similar theme, over at DrunkenData, Jon Toigo's latest post asks if you are[Feeling Insecure About Your Job?]ScoreLogix’s Job Security Index has fallen in the United States, with a sharp drop specifically for IT jobs. Jon points out that while it might be easy to point out that a number went up or down, it is far more difficultto explain why it did so. He gives a good piece of career advice:
Want to keep your job? Play by the rules of the front office: demonstrate the value of what you do for the company from the standpoint of cost-savings, risk reduction and process improvement. Make yourself indispensable. If they don’t appreciate you then, you need to move on. You will always be hiding in your cubical and sweating a pink slip ...
So shine bright. Be remarkable. It is not always easy to communicate your value in a technical position to cluelessnon-technical managers. Certainly, writing a blog helps. Within IBM, there are over 3500 bloggers. Most postwithin the safe confines behind the firewall, but manage to generate ideas, present valid arguments, and get theconversation rolling with the right set of people that might be difficult otherwise in a company like IBM of over350,000 employees scattered around the world. A few of us daringly blog in full public, and carry the conversationto our clients, prospects, analysts, journalists, Business Partners, and others within the IT industry.
So, Polly Pearson from EMC, although we have never met in person, I too welcome you to the blogosphere!
There is a difference between improving "energy efficiency" versus reducing "power consumption".
Let's consider the average 100 watt light bulb, of which 5 watts generate the desired feature (light), and 95 percent generated as undesired waste (heat). In this case, it would be 5 percent efficient. If you delivered a new light bulb that generated 3 watts of light for only 30 watts of energy, then you would have an offering that was more energy efficient (10 percent instead of 5 percent) and use 70 percent less power (30 watts instead of 100 watts). This new "dim bulb" would not be as bright as the original, but has other desirable energy qualities.
Nearly all of the output of data center equipment results in heat.In The Raised Floor blog [It's Too Darn Hot!], Will Runyon explains how IBM researcher Bruno Michel in Zurich has developed new ways to cool chips with water shot through thousands of nozzles, much like capillaries in the human body. This is just one of many developments that are part of IBM's [Project Big Green]
But what if the desired feature is heat, and the undesired feature is light?In the case of Hasbro's toy[Easy-Bake Oven],a 100W incadescent light bulb is used to bake small cakes. This is generating 95W of desired heat, and onlywasting 5 percent as light (unused inside the oven). That makes this little toy 95 percent energy efficient, butconsumes as much energy as any other 100W light bulb lamp or fixture in your house. With manufacturing switchingfrom incadescent to compact flourescent bulbs, this toy oven may not be around much longer.
While we all joke that it is just a matter of time before our employers make us ride stationary bicycles attached to generators to power our monstrous data centers, 23-year old student Daniel Sheridan designeda see-saw for kids in Africa to play on that generates electricity for nearby schools. [Dan won the "mostinnovative product" at the Enterprise Festival].
Another approach is to improve efficiency by converting previously undesirable outcomes to desirable. Brian Bergstein has a piece in Forbes titled["Heat From Data Center to Warm a Pool"].Here's an excerpt:
"In a few cases, the heat produced by the computers is used to warm nearby offices. In what appears to be a first, the town pool in Uitikon, Switzerland, outside Zurich, will be the beneficiary of the waste heat from a data center recently built by IBM Corp. (nyse: IBM) for GIB-Services AG.
As in all data centers, air conditioners will blast the computers with chilly air - to keep the machines from exceeding their optimum temperature of around 70 degrees - and pump hot air out.
Usually, the hot air is vented outdoors and wasted. In the Uitikon center, it will flow through heat exchangers to warm water that will be pumped into the nearby pool. The town covered the cost of some of the connecting equipment but will get to use the heat for free."
I see a business opportunity here. Next to every data center lamenting about their power and cooling, build a state-of-the-art fitness center for the employees and nearby townspeople. Exercise on a stationary bicyclegenerating electricity, while your kids play on the see-saw generating electricity, and then afterwards thewhole family can take a dip in the heated swimming pool. And if the company subscribes to the notion of a Results-Oriented Work Environment [ROWE],it could encourage its employees to take "fitness" breaks throughout the day, rather than having everyone there in the early morning or late evening hours, leveling out the energy generated.
In explaining the word "archive" we came up with two separate Japanese words. One was "katazukeru", and the other was "shimau". If you are clearing the dinner plates from the table after your meal, for example, it could be done for two reasons. Both words mean "to put away", but the motivation that drives this activity changes the word usage. The first reason, katazukeru, is because the table is important, you need the table to be empty or less cluttered to use it for something else, perhaps play some card game, work on arts and craft, or pay your bills. The second reason, shimau, is because the plates are important, perhaps they are your best tableware, used only for holidays or special occasions only, and you don't want to risk having them broken. As it turns out, IBM supports both senses of the word archive. We offer "space management" when the space on the table, (or disk or database), is more important, so older low-access data can be moved off to less expensive disk or tape. We also offer "data retention" where the data itself is valuable, and must be kept on WORM or non-erasable, non-rewriteable storage to meet business or government regulatory compliance.
The process of archiving your data from primary disk to alternate storage media can satisfy both motivations.
IBM offers software specifically to help with this archival process.For email archive, IBM offers [IBM CommonStore] for Lotus Domino and MicrosoftExchange. For database archive, including support for various ERP and CRM applications, IBM offers [IBM Optim] from the acquisition of Princeton Softech.
The problems occur when companies, under the excuse of simplification or consolidation, feel they can just usetheir backups as archives. They are taking daily backups of their email repositories and databases, and keepingthese for seven to ten years. But what happens when their legal e-discovery team needs to find all emails or database records related to a particular situation, an employee, client or account? Good luck! Most backupsare not indexed for this purpose, so storage admins are stuck restoring many different backups to temporary storage and combing through the files in hopes to find the right data.
Backups are intended for operational recovery of data that is lost or corrupted as a result of hardware failures, application defects, or human error. Disk mirroring or remote replication might help with hardware failures, but any logical deletion or corruption of data is immediately duplicated, so it is not a complete solution. FlashCopy or Snapshot point-in-time copies are useful to go back a short time to recover from logical failures, but since they are usually on the same hardware as the original copies, may not protect against hardware failures. And then there's tape, and while many people malign tape as a backup storage choice, 71 percent of customers send backups to tape, according to a 2007 Forrester Research report.
Backups often aren't viable unless restored to the same hardware platform, with the same operating system and application software to make sense of the ones and zeros. For this reason, people typically only keep two to five backup versions, for no more than 30 days, to support operational recovery scenarios. If you make updatesto your hardware, OS or application software, be sure to remember to take fresh new backups, as the old backupsmay no longer apply.
Archives are different. Often, these are copies that have been "hardened" or "fossilized" so that they make sense even if the original hardware, OS or application software is unavailable. They might be indexed so that they can be searched, so that you only have to retrieve exactly the data you are looking for. Finally, they are often stored with "rendering tools" that are able to display the data using your standard web browser, eliminating the need to have a fully working application environment.
Take any backup you might have from five years ago and try to retrieve the information. Can you do it? This might be a real eye-opener. You might have inherited this backup-as-also-archive approach from someone else, and are trying to figure out what to do differently that makes more sense. Call IBM, we can help.
Tim Ferris started the festivities with [The Grand Illusion: The Real Tim Ferriss speaks]. He claimed that for the past year, he outsourced the writing of his blog to a writer from India, and an editor from the Philippines. Given that his post was dated March 31, and he writes frequently about the benefits of outsourcing, it appeared like a legitimate post. However, Tim fessed up the following day, claiming that it was April 1 in Japan where he wrote it.
Guy Kawasaki wrote[April Fools' Stories You Shouldn't Believe]including my favorite #12 "Ruby on Rails cited Twitter as the centerpiece of its new 'Rails Can Scale' marketing program." Speaking of Twitter, Fellow IBM blogger Alan Lepofsky from our Lotus Notes team wrote[Great, now there is Twitter Spam]. It looked like a real post, but then I realized, ... everything on Twitter is spam!
Topics like energy consumption and global warming were fodder for posts and pranks.The post[Was Earth Hour a joke again?], argued thatthe preparation of "Earth Hour" last week in effect used up more energy than the hour of this annual "lights-off event" actually saved. This reminded me of John Tierney's piece in the New York Times ["How virtuous is Ed Begley, Jr.?"] where a scientist explains that it is more "green" for the environment to drive a car short distances than to walk:
If you walk 1.5 miles, Mr. Goodall calculates, and replace those calories by drinking about a cup of milk, the greenhouse emissions connected with that milk (like methane from the dairy farm and carbon dioxide from the delivery truck) are just about equal to the emissions from a typical car making the same trip. And if there were two of you making the trip, then the car would definitely be the more planet-friendly way to go.
Wayan Vota, my buddy over at OLPCnews, writes in his post[Windows XO Child Centric Development] that the "Sugar" operating environment on the innovative Linux-based XO laptops will soon be re-named the"Windows XO Operating System", with their new motto "Windows XO: A Child-Centric Operating Platform for Learning, Expression and Exploration." The mocked up photo of an XO laptop with the Windows XO logo was excellent!
The economists from Freakonomics explain in [And While You're at it, Toss the Nickel] that it costs the US Government 1.7 cents to produce each penny. The US government loses $50 million dollars each year making pennies. Each nickel costs 10 cents to produce. This one was dated March 31, so it could actually be true. Sad, but true.
My favorite, however, was EMC blogger Barry Burke's post["5773 > c"] explaining howtheir scientists were able to reduce latency on the EMC SRDF disk replication capability:
What the de-dupe team found is that there is a hidden feature within recent generations of this chip that allow a single bit, under certain circumstances, to represent TWO bits of information.
Still, almost 34% of the total bits transferred were in fact aligned double-zeros, far more than all other bit combinations - and most importantly, these were quite frequently byte-aligned, as required by this new-found capability. Makes sense, if you think about it - most of those 32- and 64-bit integers are used to store numbers that are relatively small (years, months, days, credit charges, account balances, etc.). So that's why the team decided to use this new two-fer bit to represent "00".
Mathematically, if you can transmit 34% of the data using half as many bits, you reduce the number of bits you have to transfer in total by 17%. Which, while not necessarily earth-shattering, is nothing to be ashamed of. On top of the SRDF performance enhancements delivered in 5772 (30% reduction in latency or 2x the distance), this new enhancement adds another 17% latency improvement (or ~1.4x more distance at the same latency). Combined with 5772, SRDF/S customers could see a 50% reduction in latency. And 5773 allows SRDF/A cycle times to be set below 5 seconds (with RPQ) - this new feature adds a little headroom to maximize bandwidth efficiency for the shortest possible RPO.
Again, this looked real, until I did the math. Start with the speed of light in a vacuum of space ("c" in BarryB's title) which is roughly 300,000 kilometers per second, or put into more understandable units, 300 kilometers per millisecond. However, light travels slower through all other materials, and for fiber optic glass it is only 200 kilometers per millisecond. Sending a block of data across 100km, and then getting a response back that it arrived safely, is a total round-trip distance of 200km, so roughly 1 millisecond. However, EMC SRDF often takes two or three round-trips per write, versus IBM Metro Mirror on the IBM System Storage DS8000 which has got this down to a single round-trip. The number of round-trips has a much bigger effect on latency than EMC's double-bit data compression technique. With IBM, you only experience about 1 millisecond latency per write for every 100km distance between locations, the shortest latency in the industry.
It is good that once a year, you should be skeptical of what you read in the blogosphere, and sometimes check the facts!
My father's favorite question is "What's the worst that could happen?" He is retired now, but workedat the famous [Kitt Peak National Observatory] designing some of the largesttelescopes. Designing telescopes followed well-established mechanical engineering best practices, but each design was unique,so there was always a chance that the end result would not deliver the expected results. What's the worst that can happen? For telescopes, a few billion dollars are wasted and a few years are added to the schedule. Scrap it and start over. Nothing unrecoverable for the US government with unlimited resources and patience.
... the rest of the grimness on the front page today will matter a bit, though, if two men pursuing a lawsuit in federal court in Hawaii turn out to be right. They think a giant particle accelerator that will begin smashing protons together outside Geneva this summer might produce a black hole or something else that will spell the end of the Earth — and maybe the universe.
Scientists say that is very unlikely — though they have done some checking just to make sure.
The world’s physicists have spent 14 years and $8 billion (US dollars) building the Large Hadron Collider, in which the colliding protons will recreate energies and conditions last seen a trillionth of a second after the Big Bang. Researchers will sift the debris from these primordial recreations for clues to the nature of mass and new forces and symmetries of nature.
But Walter L. Wagner and Luis Sancho contend that scientists at the European Center for Nuclear Research, or CERN, have played down the chances that the collider could produce, among other horrors, a tiny black hole, which, they say, could eat the Earth. Or it could spit out something called a “strangelet” that would convert our planet to a shrunken dense dead lump of something called “strange matter.” Their suit also says CERN has failed to provide an environmental impact statement as required under the National Environmental Policy Act.
Although it sounds bizarre, the case touches on a serious issue that has bothered scholars and scientists in recent years — namely how to estimate the risk of new groundbreaking experiments and who gets to decide whether or not to go ahead.
What's the worst that can happen? Scientists now agree that it is sometimes difficult to predict, and someeffects may be unrecoverable.
Unfortunately, this is not the only example of people attempting things they may not understand well enough. Theweb comic below has someone complaining they are out of disk space, and the sales rep suggests solving this with a few commands which will result in deleting all her files. Hopefully, most people reading will recognize this is meant as humor, and not actually attempt the code fragments to "see what they do".
This is a webcomic called "Geek and Poke". If you dare to read the punchline, click here: Funny Geeks - Part 5.
Warning: Do not try the code fragments unless you know what to expect!
Sadly, I often encounter clients who have a "keep forever" approach to their production data. When they are seriously out of space, they feel forced to either buy more disk storage, or start "the big Purge": deleting rows from their database tables, emails older than 90 days, or some other drastic measures. With a focus on keeping down IT budgets, I fear that thesedrastic measures are growing more common. What's the worst that could happen? You might need that data for defending yourself against a lawsuit, or need it to continue to provide service to a loyal client, or just continue normal business operations.I have visited companies where a junior administrator chose the "big Purge" option, without a full understanding ofwhat they were doing, resulting in business disruption until the data could be recovered or re-entered.
IBM offers a better way. Data that may not be needed on disk forever could be moved to lower-cost tape, using up less energy and less floorspace in your data center. Solutions can automatically delete the data systematically based on chronological or event-based retention policies, with the option to keep some data longer in response to a "legal hold" request.
That's certainly better than to risk shrinking your business into a "dense dead lump"!
I got some interesting queries about IBM's Scale-Out File Services [SoFS] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
As a consultant, I am often asked to help design the architecture for the information infrastructure. A usefulanalogy to gather requirements and preferences is the difference between area rugs and wall-to-wall carpeting. Arearugs are not secured to the floor and cover only a portion of the floor area. Carpets are generally tacked or cemented to the floor, often with an underlay of cushion padding, stretched across the entire floor surface, out to all four walls of each room.
Each has its pros and cons, and often is a matter of preference. Some people like area rugs because they can choosea different style for each room, match the decor and color scheme of furniture, and use these to define each livingspace. Ever since paleolithic man put animal skins on the floor of their cave, people recognize that cold, hard andugly floors could be covered up with something soft and more attractive.Others prefer wall-to-wall carpeting because they want to walk around the house barefoot, have their young children crawl on their hands and knees, and give the entire house a unified look and feel. This is often an inexpensive option when compared against the cost of individual rugs.
The same is true for an information infrastructure. For some, they prefer the "area rug" approach: this style ofstorage for their email, this other type of storage for their databases, and perhaps a third for their unstructuredfile systems. When customers ask what storage would I recommend for their SAP application, or their Microsoft Exchangeemail environment, or their Business Intelligence (BI) software, I recognize they are taking this "area rug" approach.
Like area rugs, having different storage can focus on specific attributes of the workload characteristics. It alsoinsulates against company-wide changes, the dreaded "rip-and-replace" of replacing all of your storage with somethingfrom a different vendor. With "area rug" storage, you can support a dual-vendor or multi-vendor strategy, and upgrade or replace each on its own schedule.
Thanks to open standards and industry-standard benchmarks, changing out one storage solution for another is assimple as rolling up an area rug, and putting another one in its place that is similar in size dimensions.
Others may prefer "wall-to-wall carpeting" approach: one disk system type, one tape library type,one network type, that provides unified management and minimizes the needs for unique skills. Generally, the choice of NAS, SAN or iSCSI infrastrucutre is done company-wide, and might strongly influence the set of products that will support that decision. For example, those with a mix of mainframe and distributed servers looking for SAN-attached storage may look at an [IBM System Storage DS8000] and [TS3500 tape library] that can provide support for FICON and FCP.
Those looking at NAS or iSCSI might consider the IBM System Storage N series products, "unified storage" supporting iSCSI, FCP and NAS protocols. If you want the "wall-to-wall" to stretch across all the sites in your globally integrated enterprise, IBM's scalable NAS product, Scale-Out File Services[SoFS], provides a global name spacein combination with a clustered file system that provides incredible scalability and performance based on field-proven technology used by the majority of the [Top 100 supercomputer] deployments.
IBM can help you design an information infrastructure that fits either approach.
Soon, the U.S. is switching on-air television signals from analog to digital format. The switch-over happensFebruary 17, 2009. According to the [Federal Communications Commission], Americans haveuntil this Monday, March 31, to request up to two 40-dollar coupons towards the purchase of digital-to-analog converter boxesso that the on-air digital signals can be used with existing analog-only television equipment.
(For my readers outside the United States, a bit of background explanation may be necessary. Americans consider access to television a self-evident and unalienable right.According to a Pew Research report[Luxury or Necessity?] 64 percent of Americansconsider a television set a necessity, and 33 percent consider paid providers, like cable or satellite, a necessity.Even prisoners in U.S. jails are allowed to watch television!)
Taking advantage of the "Y2K crisis" like nature of this 2/17/2009 deadline, paid providers have been advertisingthat this deadline only applies to on-air customers. Those who have cable or satellite can continue to use theiranalog equipment. I have been a subscriber for Cox Cable for some time, and my parents recently made the switchas well. Two weeks ago, however, my parents called me in a panic. Cox Cable chose to move one channel, TurnerClassic Movies (TCM), over from their analog line-up over to their digital line-up. They thought this wasn't goingto happen until 2/17/2009! They asked me to investigate and provide them alternative options.
I spoke to a Cox Cable representative.
Did Turner force Cox Cable to do this? Did they digitize their entire collection of movies? No, Cox Cable is choosing to send the TCM signal over the digital bandwidth, and they are converted back to analog by their set-top box.
Do customers who now get one less channel get a discount? No, same price, less service.
Why move a single channel over? Eventually, everything is going digital, and this is a small "baby step" to getpeople to switch over.
But TCM is a collection of grainy, black-and-white movies from the 1950s and 1960s, it is probably the channelthat gets the least benefit to convert to digital. Why choose TCM specifically? TCM is "commercial-free" so providesno additional revenue opportunity. Moving this to digital frees up an analog channel to run a new "on demand" servicethat could generate additional revenue for Cox Cable.
What would it take in terms of additional cost and equipment to watch the TCM in digital?A set-top digital box from Cox Cable, which costs one-time 10 dollars to install by a professional technician, plus 11 dollars per month for the extra "service" provided.
Do I need a High-Def television set or other equipment? No, the digital signal for TCM is standard format, so no HD equipment required.
I currently split my cable signal, so that I can watch one channel and record another, or record two separate channels at the same time, using a standard format VCR and Tivo, can I continue to do this with the digital set-top box? Yes, absolutely.
I decided to give it a try, and a technician was scheduled to perform the installation last Sunday, which was Easter holiday for some people. The technician was able to connect the set-top box directly to my television set, but thesignal is converted to a single "Channel 3", forcing the use of a separate Cox Cable remote control unit to set the channel on the set-top box. He set the set-top box to TCM (channel 199) and showed that the TCM channel was now available again.
How would my VCR or Tivo record anything? You have to set the set-top box manually to the appropriate channel desired, then set the VCR or Tivo to record "Channel 3".
How would I record one channel while watching another? That does not appear possible with this set-top box. If we split before entering the set-top box, then that equipment would get the analog channels only, not TCM.
How about recording two different channels concurrently? No way.
I feel bad for the technician. He spent two hours on his Easter Sunday to install service that I was told by theirsales rep would work with my equipment, only to find out it won't and he ended up having to take it all back out andcancel the work order. He doesn't even get paid overtime for this.
So, I am back to where I was before, analog channels minus the TCM channel. However, the lesson is clear, eventuallyeverything is going to digital, and people may not realize what this means to them.
Yesterday marked the first day of Spring here in the Northern hemisphere, and often this means it is timefor some "Spring cleaning". This is a great time to re-evaluate all of your stuff and clean house.
In the bits-vs-atoms discussion, Annie Leonard has a quick [20-minute video] about the atoms side of stuff,from extraction of natural resources, production, distribution, consumption, to final disposal.
On the bits side of things, the picture is much different.
We don't really extract information,rather we capture it, and lately that process is done directly into digital formats, from digital photography, digital recording of music, and so on. A lot of medical equipmentnow take X-rays and other medical images directly into digital format. By 2011, it is estimated that as much as 30 percent of all storage will be for holding medical images.
Production refers to the process of combining raw materials and making them into something useful. The sameapplies to information, there are a variety of ways to make information more presentable. In the Web 2.0 world, these are called Mashups, combiningraw information in a manner that are more usable.Fellow IBM blogger Bob Sutor discusses IBM's latest contribution, SMash, in his post[Secure Mashups via SMash].
According to Tim Sanders, 90 percent of business information is distributed by email, but less than 10 percentof employees are formally trained to distribute information correctly. Here's a quick 3-minute trailerto his "Dirty Dozen" rules of how to do email properly.
I have not watched the DVD that this trailer is promoting, but I certainly agree with the overall concept.
This week I also had the pleasure to hear [Art Mortell], author ofthe book The Courage to Fail: Art Mortell's Secrets to Business Success. He gave an inspirational talk about how to deal with our stressful lives. One key pointwas that stress often came from our own expectations. This is certainly true on how we consume information.Often times our expectations determine how well we read, watch or listen to information being presented.Sometimes information is factually correct, but presented in such a boring manner that it is just toodifficult to consume.
John Windsor on YouBlog takes this one step further, asking [Are you predictable?]He makes a strong case on why presenting in a predictable manner can actually hurt your chances of communication.
And finally, there is disposal. We are all a bunch of digital pack-rats. With atoms, you eventuallyrun out of closet space, with bits the problem is not as obvious, and often can be resolved by spendingyour way out of it. On average, companies are expanding their storage capacity by 57 percent every year. Thatworked well when dollar-per-GB prices of disk dropped to match, but now technology advancements are slowing down. Diskwill not be dropping in price as fast as you need, and now might be a good time to re-evaluate your"Keep everything forever" strategy.
Consider "Spring cleaning" to be an excellent excuse to evaluate the data you have on your disk systems.Should it be on disk? Will it be accessed often enough to justify that cost? Does it need immediateonline access times, or can waiting a minute or two for a tape mount from an automated library be sufficient?Does it represent business value?
I have been to customers that have discovered a lot of "orphan data" on their disk systems. This isdata that does not belong to anyone currently working at the company. Maybe the owners of the data retired,were laid off, or even fired, but nobody bothered to clean up their files after they left the company.
I've also seen a lot of "stale data" on disk, data that has not be read or written in the past 90 days.Are you spending 13-18 watts of energy to spin each disk drive just to contain data nobody ever looks at?
In some cases, orphan or stale data represents business value, and need to be kept around for businessor legal reasons. Perhaps some government regulation requires you to retain this information for someyears. In that case, rather than deleting it, move it to tape, perhaps using theIBM System Storage DR550 to protect it for the time required and handle its eventual disposal.
Certainly something to think about, while you snap the ears off those chocolate bunnies, watching yourkids run around looking for eggs. Enjoy your weekend!
Jon Toigo over at DrunkenData writes in his post[A Wink and a Nod] about thebenefits of the new IBM System z10 Enterprise Class mainframe. Here's an excerpt about storage:
"The other key point worth making about this scenario is that storage behind a z10 must conform to IBM DASD rules. That means no more BS standards wars between knuckle-draggers in the storage world who continue to mitigate the heterogeneous interoperability and manageability of distributed systems storage using proprietary lock in technologies designed as much to lock in the consumer and lock out the competition as to deliver any real value. That has got to be worth something."
For z/OS and TPF operating systems, disk must support CCW commands over ESCON or FICON connections, or NFS commandsover the Local Area Network. However, most of the workloads that are being ported over from x86 platforms willprobably be running Linux on System z images, and as such Linux supports both CCW and SCSI protocols, the latterover native FCP connections through a Storage Area Network (SAN) or via iSCSI over the Local Area Network. Many SAN directors support both FCP and FICON, and the z10 also supports both 1Gbps and 10Gbps Ethernet, so you may not have to invest in any new networking gear.
The best part is that you may not have to migrate your data. The IBM System Storage SAN Volume Controller is supported for Linux on System z, and with "image mode" you can leave the data in its original format on its original disk array. Many file systems are now supported by Linux, including Windows NTFS with the latest NTFS-3G driver.
If your data is already on NAS storage, such as the IBM System Storage N series disk systems, then the IBM z10can access it directly, from z/OS, z/VM or Linux.
Have lots of LTO tape data? Linux on System z supports LTO as well.
Jon continues his rant with a question about porting Microsoft Windows applications. Here's another excerpt:
"For one, what do we do with all the Microsoft servers. There is no Redmond-sanctioned approach to my knowledge for virtualizing Microsoft SQL Server or Exchange Server in a mainframe partition."
Yes, it is possible to run Windows on a mainframe through emulation, but I feel that's the wrong approach. Instead, the focus should be on running "functionally equivalent" programs on the native mainframe operating systems, and again Linuxis often the best choice for this. Switching from Windows to Linux may not be "Redmond-sanctioned", but it getsthe job done.
Instead of SQL Server, consider something functionally equivalent like IBM's DB2 Universal Database, or perhaps an open source database like MySQL, PostgreSQL or Apache Derby. Well-written applications use standard SQL calls, so ifthe application does not try to use unique, proprietary features of MS SQL Server, you are in good shape.
In my discussion last November on [Microsoft Exchange email server], I mentioned that Bynari makes a functionally equivalent email server on Linux that works with your existing Microsoft Outlook clients. Your end-users wouldn't know you migrated to a mainframe! (well, they might notice their email runs faster)
So if your data center has three or more racks of Sun, Dell or HP "pizza box" or "blade" x86 servers, chances are you can migrate the processing over to a shiny new IBM z10 EC mainframe, save some money in the process, without too much impact to your existing Ethernet, SAN or storage system infrastructure. IBM can even help you dispose of the oldx86 machines so that their toxic chemicals don't end up in any landfill.
I figured I need to say something about "green" on this special holiday (and yes, I am partially Irish, andthe majority of my siblings have bright red hair and freckles as it runs through my family)
Last week, I had the pleasure to meet [Dr. Jia Chen]. She has a PhDin nanotechnology and works in IBM's Watson Research Center. She is recognized as one of the top 35 scientistsunder 35 years of age by MIT, top 15 of the "Nano 50", and one of the top 80 in the National Academy of Engineering.
The two of us presented to clients at the BMW Performance Center in Greenville, SC, on the topic of the "Green" IT data center. She covered all of the advancements IBM is making on the server side, and I coveredall the things on the storage side.
The BMW Performance center is part "briefing conference location" and part "driving school". Everyone had a greattime watching the crazy stunts of the professional drivers skidding and spinning on a closed course. Some hadthe opportunity to actually drive or ride in the cars themselves.
BMW is introducing its own "energy efficiency initiative" with their [X3 Hybrid] vehicle,which will be manufactured in Greenville, SC plant.
A [recent survey] conductedby Fleishman-Hillard Researchindicates that the majority of disk-only customers are now lookingat adding tape back into their infrastructure. Here are some excerpts:
"Over two thirds of surveyed businesses said they were lookingto add tape storage back into their overall network infrastructure and of those respondents, over80-percent plan to add tape storage solutions within the next 12 months.The survey, which was taken in the fourth quarter of 2007, focused on the views of morethan 200 network administrators and mid-level tech specialists at mid-size to large companiesthroughout the United States.
The integration of tape storage into a tiered information infrastructure is highly strategic forcustomers, due to its low cost of ownership, low energy consumption and portability for dataprotection, said Cindy Grossman, Vice President of Tape Storage Systems, IBM. LTO tapetechnology is a perfect choice for enterprise and mid-sized customer with its proven reliability, highcapacity, high performance and ability to address data security with built-in encryption and dataretention requirements for the evolving data center.
According to the survey, 58-percent of the respondents use a combination of disk and tapefor long term archiving, 24-percent use tape exclusively, and 18-percent employ a disk-onlyapproach. In this group, 68-percent of the current disk-only users plan to start using tape for longtermarchiving, and over half (58-percent) plan to add tape for short-term data protection.The survey findings suggest that disk-only users may be experiencing a bit of buyer sremorse, said David Geddes, senior vice president at Fleishman-Hillard Research, who oversawthe study. We found that a wide majority of companies that employ purely disk-basedapproaches are looking to quickly include tape in their backup and archiving strategies."
While disk provides online data access and availability, tape provides additional data protectionand security, lower total cost of ownership (TCO), lower energy consumption (Tape is more "green"),and can be an important part of a long term data retention and compliance strategy.
Disk is more costly, more energy hungry, and some data, although it must be retained, may seldom, if ever be looked at, so why keep it spinning?
Speaking of TCO, in a recent 5-year TCO analysis by the Clipper Group titled[“Disk and Tape Square Off Again”]stored 2.4PB of data long term on SATA disk and on an LTO tape library, the disk system was:23:1 more costly, used 290 times the amount of energy than tapeEven with a data dedupe system like IBM System Storage N series, disk was still 5 times more costly than the tape system.
The Linear Tape Open (LTO) consortium --consisting of IBM, Hewlett-Packard (HP) and Quantum-- just released its "LTO-5" plans. With 2:1 compression,you will be able to pack up to 3TB of data on a single tape cartridge. And while dollar-per-GB declinefor disk is slowing down to 25-30 percent per year, tape continues to decline at a healthy 40 percent rate, so the price gap between diskand tape will actually widen even further over the next few years.
At IBM, our standard is to have a limit of 200MB per user mailbox. A few of us get exceptions and have up to500MB limit because of the work we do. By comparison, my personal Gmail account is now up to 6500MB. Whenthis limit is exceeded, you are unable to send out any mail until it is brought down below the limit, and a request to be "re-enabled for send" is approved, a situation we call "mail jail".
The biggest culprit are attachments. Only 10 percent of emails have attachments, but those that do take up 90percent of the total space! People attach a 15MB presentation or document, and copy the world ondistribution list. Everyone saves their notes with these attachments, and soon, the limits are blown. Not surprisingly, deduplication has been cited as a "killer app" to address email storage, exactly for this reason.If all the users have their mailboxes all stored on the same deduplication storage device, it might find theseduplicate blocks, and manage to reduce the space consumed.
A better practice would be to avoid this in the first place. Here are the techniques I use instead:
Point to the document in a database
We are heavy users of Lotus Notes databases. These can be encrypted and controlled with Access Control Lists (ACL)that determine who can create or read documents in each database. Annually, all the database ACLs are validatedso that people can confirm that they continue to have a need-to-know for the documents in each database. Sendinga confidential document as a "document link" to a database entry takes only a few bytes, and all the recipientsthat are already on the ACL have access to that document.
Point to the document on a web page
If the document is available on an internal or external website, just send the URL instead of attaching the file.Again, this takes only a few bytes. We have websites accessible only to all internal employees, websites thatcan be accessed only by a subset of employees with special permissions and credentials based on their job role, and websites that are accessible to our IBM Business Partners.
In my case, if I happen to have a blog posting that answers a question or helps illustrate an idea, I will sendthe "permalink" URL of that blog post in my email.
Point to the document on shared NAS file system
Internally, IBM uses a "Global Storage Architecture" (GSA) based on IBM's Scale-Out File Services [SoFS] with everyone getting initially 10GB of disk space to store files, with the option to request more if needed. The system has policy-based support for placing and migrating older data to tape to reduce actual disk usage, and combines a clustered file system with a global name space.
My SoFS space is now up to 25GB, and I store a lot of presentationsand whitepapers that are useful to others. A URL with "ftp://" or "http://" is all you need to point to a filein this manner, and greatly reduces the need for attachments. I can map my space as "Drive X:" on my Windows system,or as a NFS mount point on my Linux system, which allows me to easily drag files back and forth.
Departments that don't need to offer "worldwide access" use NAS boxes instead, such as the IBM System Storage N series.
Pointing to files in a shared space, rather than as attachments in email, may take some getting used to. I've hada few recipients send me requests such as "can you send that as an attachment (not a URL)" because they plan toread it on the airplane or train, where they won't have online connectivity.
"Have you invested in the latest and greatest in collaboration technology but still feel people are still not collaborating? How many Microsoft Sharepoint servers and IBM Quickplaces remain relatively untouched or only used by the organization's technorati? I think it's a big problem because this narrow view of collaboration starts to get the concept a bad name: "yeah, we did collaboration but no one used it." And then there the issue of the vast amount of money wasted and opportunities lost. We can't afford to loose faith in collaboration because the external environment is moving in a direction that mandates we collaborate. The problems we face now and into the future will only increase in complexity and it will require teams of people within and across organizations to solve them."
Well, sending pointers instead of attachments works for me, and has kept me out of "mail jail" for quite some timenow.
IDC, an independent industry analyst firm, put out their 4Q07"Worldwide Disk Storage Systems Quarterly Tracker" report. Here is an excerpts from their [press release]:
"Worldwide external disk storage systems factory revenues posted 9.8 percent year-over-year growth in the fourth quarter of 2007 (4Q07) and totaling $5.3 billion (USD), according to the IDC Worldwide Disk Storage Systems Quarterly Tracker. For the quarter, the total disk storage systems market grew to $7.5 billion (USD), up 7.6 percent from the prior year's fourth quarter. Total disk storage systems capacity shipped reach 1,645 petabytes, growing 56.3 percent."
For those wondering how an industry could grow 56.3 percent in capacity, but only 7.6 percent in revenue, it isbecause the average dollar-per-GB dropped in 2007 from $6.63 down to $4.56 (USD), representing a 31 percent decline.In the past, disk prices dropped 40 to 60 percent each year, so making single digit growth was the best major vendorscould hope for. However, lately this has slowed down to 25 to 35 percent decline, but the client demand for capacity continues at the 60 percent pace, which means that vendors could achieve double digit revenue growth soon.
Once again, IBM was ranked number 1 in total disk storage. No surprise there. Here are the details:
"Total Disk Storage Systems Market
In the total worldwide disk storage systems market, IBM lead the market with 22.9 percent followed by HP with 18.1 percent revenue share. EMC maintained the third position with 16.0 percent revenue share.
For the full year, the total disk storage systems market posted 6.6 percent growth to $26.3 billion (USD). In the total worldwide disk storage systems market, IBM and HP lead the market in statistical tie with 20.1 percent and 19.4 percent revenue share, respectively. EMC maintained the third position with 15.2 revenue revenue share."
But why focus just on disk? IDC also released their"Worldwide Combined Disk and Tape Storage 3Q07 Market Share Update", and IBM was number one for that as well,taking in 21.9 percent share. Here's a quote of IBM VP Barry Rudolph in[CNN Money]:
"IBM's continued leadership in the storage hardware market reaffirms our strategy to provide the most comprehensive tiered portfolio of storage offerings, ranging from software and services to disk and tape storage solutions," said Barry Rudolph, Vice President, Storage Stack Solutions, IBM. "IBM is the clear choice for providing information infrastructure solutions that offer the most cost-efficient, streamlined approach to help our customers increase overall productivity and maximize performance."
It is looking like 2008 is going to be a good year for IBM!
IPv4, IPv6, Wireless Mesh networking? No problem! You know linux networking inside and out
Extensive knowledge of BIND, DHCPD, Squid, Apache, security, etc.
Experience working with [Moodle] would be most excellent (it is basically a PHP web application that maintains MySQL databases for lesson plans, homework assignments and other school related information)
Adept with Python scripting or could learn it quickly. OLPC has standardized on Python for scripting (although knowledge in Perl and PHP won't hurt either)
You look to implement a practical solution that less skilled sysadmins can easily maintain over a cooler but more complicated solution.
You play well with others. You don’t alienate collaborators with rude e-mails that assert your technical superiority (even though you are)
Your primary concern is meeting the educational needs of kids and teachers. Your rate technical awesomeness a distant second to meeting those critical needs.
I've been working with Dev, Bryan and Sulochan for the past three months (remotely here from Tucson, AZ)but we've come to a point where we need on-site expertise. I will continue to provide remote support.
Given the number of readers who have contacted me over the past year looking for an IT job (or a different job because they are not happy where they are), this could be an amazing experience.
It's been a while since I've talked about [Second Life].
The latest post on eightbar[Spimes, Motes and Data centers]discusses IBM's use of virtual world technology to analyze data centers in three dimensions.New World Note asks[What's The Point Of 3D Data Centers?]One would think that a simple monitoring tool based on a two-dimensional floor plan would be enough to evaluate a data center.
Enter Michael Osias, IBM (a.k.a Illuminous Beltran in Second Life). Some of the leading news sites havebegun to notice some 3D data centers that he has helped pioneer. UgoTrade writes up an article aboutMichael and the media attention in [The Wizard of IBM's 3DData Centers].
Of course, in presenting these "Real Life/Second Life" (RL/SL) interactive technologies, IBM is sometimes the target of ridicule. Why? Because IBM is 10 years ahead of everyone else. So, are there aspects of a data center where 3D interfaces makes sense? I think there is.
IBM TotalStorage Productivity Center has an awesome "topology viewer" that shows what servers are connectedto which switches, to which disk systems and tape libraries. This is all done in a 2D diagram, generated dynamicallywith data discovered through open standard interfaces, similar to what you might draw manually with toolslike Visio. Imagine, however, howmore powerful if it were a 3D viewer, with virtual equipment mapped to the physical location of each pieceof hardware on the data center floor, including the position on the rack and location on the data center floor.
Designing computer room air conditioning (CRAC) systems is actually a three dimensional problem. Cold air isfed underneath the raised floor, comes up through strategically placed "vent" tiles, taken in the front ofeach rack. Hot air comes out the back of each rack, and hopefully finds ceiling duct intake to get cooled again.The temperature six inches off the floor is different than the temperature six feet off the floor, and 3Dmonitor tools could be helpful in identifying "hot spots" that need attention. In this case "spimes" representsensors in the 3D virtual world, able to report back information to help diagnose problems or monitor events.
After many people left the mainframe in favor of running a single application per distributed server, the pendulumhas finally swung back. Companies are discovering the many benefits of changing this behavior. "Re-centralization" is the task at hand. Thanks to virtualization of servers, networks and storage, sharing common resources canonce again claim the benefits of economies of scale. In many cases, servers work together in collective unitsfor specific applications that might benefit better if consolidated together onto the same equipment.
IBM's "New Enterprise Data Center" vision recognizes that people will need to focus on the management aspectsof their IT infrastructure, and 3D virtual world technologies might be an effective way to getthe job done.
I am always amused in the manner the IT industry tries to solve problems. Take, for example, theprocess of backups. The simplest approach is to backup everything, and keep "n" versions of that.Simple enough for a small customer who has only a handful of machines, but does not scale well. Inmy post [Times a Million],I coined the phrase "laptop mentality", referring to people's inability to think through solutions in large scale.
Apparently, I am not alone.Steve Duplessie (ESG) wrote in his post[Random Thoughts]:
"I may even get to stop yelling at people to stop doing full backups every week on non-changing data (which is 80 %+) just because that's how they used to do it. They won't have a choice. You can't back up 5X your current data the way you do (or don't) today."
Hu Yoshida (HDS) does a great job explaining that thereare three ways to perform deduplication for backups:
Pre-processing. Have the backup software not backup unchanged data.
Inline processing. Have an index to filter the output of the backup as it sends data to storage.
Post-processing. Have the receiving storage detect duplicates and handle them accordingly.
"A full backup of 1TB data base tablespace is taken on day one. The next day another full backup is taken and only 2GB of that backup has any changes.
Using traditional full backup approaches after 2 nights, the backup capacity required is 2 x 1TB = 2TB
One method of calculating de-duplication ratios could yield a low ratio:
Total de-duplicated backup capacity used = 1TB + 2GB = 1.002TB
If the de-duplication ratio compares the amount of total physical storage used to the total amount that would have been used by traditional backup methods, the ratio = 2TB / 1.002TB = approximately 2:1
Another method of calculating de-duplication ratios could yield a high ratio:
Total de-duplicated backup capacity used still = 1.002TB
If the de-duplication ratio compares the amount of data stored in the most recent (second) backup to the amount that would have been used by traditional backup methods, the ratio 1TB / 2GB = 1000GB / 2GB = 500:1"
While IBM also offers deduplication in the IBM System Storage N series disk systems, I find that for backup, itis often more effective to apply best practices via IBM Tivoli Storage Manager (TSM). Let's take a look at some:
Exclude Operating System files
Why take full backups of your operating system every day? Yes, deduplication will find a lot to reduce fromthis, but best practices would exclude these. TSM has an include/exclude list, and the default version excludesall the operating system files that would be recovered from "bare machine recovery" or "new system install"procedures. Often, if the replacement machine has different gear inside, your OS backups aren't what you need,and a fresh OS install may determine this and install different drivers or different settings.
Exclude Application programs
Again, yes if there are several machines running the same application, you probably have opportunity for deduplication. However, unless you match these up with the appropriate registry or settings buried down in theoperating system, recovering just application program files may render an unusable system. Applications are bestinstalled from a common source that are either "pushed" through software distribution, or "pulled" from an application installation space.
If you have TB-sized databases, and are only doing full backups daily to protect it, have I got a solution for you.IBM and others have software that are "application-aware" and "database-aware" enough to determine what haschanged since the last backup and copy only that delta. Taking advantage of the TSM Application ProgrammingInterface (API) allows for both IBM and third party tools to take these delta backups correctly.
Which leaves us with user files, which are often unique enough on their own from the files of other users,that would not benefit from file-level deduplication. Backing up changed data only, as TSM does with its patented ["progressive incremental backup"] method, generally gets most of the benefits described by deduplication, without having to purchase storage hardware features.
Of course, if two or more users have identical files, the question might be why these are not stored on acommon file share. NAS file share repositories can greatly reduce each user keeping their own set of duplicates.It is interesting that some block-oriented deduplication,such as that found in the IBM System Storage N series, can get some benefit because some user files are oftenderivatives of other files, and there might be some 4 KB blocks of data in common.
Last November, I visited a customer in Canada. All of their problems were a direct result of taking full backupsevery weekend. It put a strain on their network; it used up too many disk and tape resources; and it took too long tocomplete. They asked about virtual tape libraries, deduplication, and anything else that could help them. The answer was simple: switch to IBM Tivoli Storage Manager and apply best practices.
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
Basic HyperSwap: single-site high availability for the disk systems only
GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
Yesterday, I asked if you were prepared for the future? The future is now. Today, IBM announced its["New Enterprise Data Center"] vision and strategy which spans software, hardware and services in dealing withthe latest challenges that our clients are faced with today, or will face sooner or later this century.
Here's an excerpt:
Align IT with business goals These changes demand that IT improve cost and service delivery, manage escalating complexity, and better secure the enterprise. And aligning IT more closely with the business becomes a primary goal. The new enterprise data center is an evolutionary new model for efficient IT delivery that helps provide the freedom to drive business innovation. Through a service oriented model, IT will be able to better manage costs, improve operational performance and resiliency, and more quickly respond to business needs. This approach will deliver dynamic and seamless access to IT services and resources, improving both productivity and satisfaction.
IBM's Vision for the New Enterprise Data Center The new enterprise data center can improve the integration of people, process, and technology in your business to help you improve efficiency and effectiveness. As you implement a new enterprise data center strategy, your infrastructure becomes open, efficient, and easy to manage. And your IT staff can move from a focus on fixing IT problems to solving business challenges. Ultimately your processes become standardized and efficient, focused on business needs rather than technology.
A lot was announced today, so I will give a quick recap now, and cover specific areas over the rest of the week.
IBM System z10 Enteprise Class
IBM introduces its most powerful mainframe. Before you think "Wait, that's a mainframe, that doesn't apply to me"stop to consider all that IBM has done to make the mainframe an "open system" without sacrificing security oravailability:
Open standard connectivity, including TCP/IP and now 6Gbps Infiniband and 10GbE Ethernet.
Unix System Services. Yes, z/OS is certified to provide UNIX interfaces for today's applications.
HFS and zFS file systems that can be mounted, shared, and used by traditional z/OS applications and JCL.
Linux and Java. Many of today's largest websites are run on mainframes behind the scenes.
Extreme bandwidth. The z10 EC handles up to 336 FICON channels (4Gbps) for large data processing workloads
The z10 EC is as powerful as 1,500 x86 (such as Intel or AMD) servers, but consumes 85 percent less floorspace and85 percent less energy. (They should put a "green" stripe down the front of this box just to remind everyone how energy efficient this server really is!) For more on the z10 EC, see the[Press Release].
Enhanced IBM System Storage DS8000
With the XIV acquisition taking the role as the best place to put unstructured files for Web 2.0 applications,the IBM DS8000 can focus on its core strength, managing databases and online transactions for the mainframe.There's enough here to justify its own post, so I will cover this later.
IT Service Management Center for z (ITSMCz)
Trust me, I don't make up these acronyms. IT Service Management are the policies and procedures for managingan IT environment, such as following the best practices documented in the IT Infrastructure Library (ITIL).In the past, IBM tools have focused on Linux, UNIX and Windows on distributed servers, but today ITSMCz bringsall of that to the mainframe! (or perhaps more correct to say "brings the mainframe to all that"!)
IT Transformation & Optimization - Infrastructure Strategy and Planning services
I don't make up the names of our service offerings either. However, one thing is clear, it is time for peopleto re-evaluate their current data center, and come up with a new plan. The average data center is 15 years old.According to Gartner Group, more than 70 percent of the world's "Global 1000" organizations will have to make significant modifications to their data centers in the next five years. IBM can help, and is rolling outa new set of services specifically to help clients make this transition, to better align their IT to their business strategies.
Economic Stimulus Package
IBM borrowed this idea from the U.S. government. IBM Global Financing is offering special terms and ratesfor new equipment installed by December 31 this year.
Want to learn more? Read this 15-page[IBM's Vision]white paper.
IBM Developerworks that host this blog suggest posting once per day. General blogging guidelines I have found suggest 300 to 500 words per post. Most magazine and newspaper articles range around 700 words.In my book, [Inside System Storage: Volume I], I had 165 posts covering twelve months, with an average of 636 words per post.
longer posts, perhaps once a week or less
I've seen several executives adopt this approach. When they have something to say, out comes a long speech,in written form, when the occasion deems it necessary. Some of the more technical blogs adopt this approachalso, going into great detail on product specifications and supporting material to make their case.
Either way, it comes out to perhaps 2000 words per week, that can be 20 posts of 100 words each, four posts that are 500 words each, or one long post for the week. Currently, I post about 2-5 times per week, with posts 500-700 words long. I can try to mix short posts with long ones, to give you readers some variety. Post a comment below on whether you prefer that I do more/shorter or fewer/longer.
As for the future of IT...
In a recent post by fellow blogger (and author) Nick Carr titled [Alan Turing, cloud computing and IT's future], he mentions he has a free download of a 7-page PDF called "IT in 2018: from Turing's machine to the computing cloud." It's a quick read, covering many of thepoints in his most recent book, The Big Switch. Here's an excerpt:
As for computer professionals, the coming of the WorldWide Computer means a realignment of the IT workforce,with some jobs disappearing, some shifting fromusers to suppliers, and others becoming more prominent.On the supplier side, we’ll likely see booming demand for the skills required to design and run reliable,large-scale computing plants. Expertise in parallelprocessing, virtualization, artificial intelligence, energymanagement and cooling, encryption, high-speed networking,and related fields will be coveted and rewarded.Much software will also need to be written orrewritten to run efficiently on the new infrastructure. Ina clear sign of the new labor requirements, Google andIBM have teamed up to spearhead a major educationinitiative aimed at training university students to writeprograms for massively parallel systems.
Some interesting insights from Google can be read in New York Times'Freakonomics blog, where Steve Dubner interviews Google's chief economist: [Hal Varian Answers Your Questions]Hal comes up with some clever answers to some rather tough questions. It's worth a read.
It is good to have futurists like this. However, as we caution in IBM, those who seek a life througha crystal ball... must often settle for a diet of broken glass.I will close with one of my favorite quotes.
"As I've said many times, the future is already here. It's just not very evenly distributed." --- William Gibson (science-fiction author)
So, yes, I may sometimes look at the rear-view mirror. However, there is a common theme from Nick Carr to Steve Dubnerto William Gibson. They also look back to the past to give insights on how things might unfold in the future.
My view is that for some the future is already here. IBM already offers the product, service or solutionthat might be just what you need, but you just haven't gotten it yet. Future for you, but past for us.For others, the future is repeating a pattern we have already seen in the past. Understanding what happened back then helps us be better prepared to understand what is happening now, in the directions and trends we forecast moving forward.
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.
Last year, in my post [Inaugural Brand Impact 2007 Awards], I mentioned how IBM beat out other major storage vendors for the best brand "IBM System Storage". I am proud of this, and highlighted it as one of my team's key accomplishments during my brief20-month career in marketing, which I recapped in my post[Switching Over from What and Why] when I switched over to consulting.
This year, IBM did it again. For a second consecutive year, IBM System Storage was recognized by [Liquid Agency]as the leading brand for enterprise storage. Here is an excerpt from the [IBM Press Release]:
"IBM System Storage is the most trusted storage portfolio in the world, providing our clients leading disk, tape and storage software solutions and services. This award reflects IBM's priority in delivering information infrastructure solutions to solve our client's most critical storage challenges," said Barry Rudolph, Vice President, IBM System Storage. "We are helping clients -- from large corporations to small businesses -- intelligently manage information as a strategic business asset. We are proud to be recognized as the clear market leader in delivering solutions that help our clients manage and extract value from their information."
Liquid Agency reviewed over 250 technology brands to make this assessment.
The Business/IT Alignment category is critical for many companies; getting these two key divisions in sync provides a huge competitive advantage. This year’s winner – by a landslide – is IBM's [Innov8].
This Big Blue product has a touch of the sci-fi to it: it’s an interactive, 3-D business simulator intended to close the divide between IT staff and business executives. In other words, it’s…a video game. I guarantee you that in all the decades that Datamation has done its Product of the Year awards, never has a video game won. The times they are a-changin’.
Whether a server is the “best” server is, in truth, based on your company’s individual needs and budgets. In the server world, with its myriad options and add-ons, one size definitely does not fit all. That said, IBM p 570 Server must fit plenty of needs; the box easily won the Enterprise Server category. IBM claims this workhorse doubles the speed of its predecessor without requiring a larger energy footprint.
IBM Lotus Symphony
When it comes to total numbers of users, there’s no question that Microsoft Office is the 800-pound gorilla of this category. The deeply entrenched Office makes the corporate world go ‘round. Given Office’s status, it’s a major eyebrow raiser that this category was won by relative newcomer IBM Lotus Symphony. Perhaps it’s because Big Blue’s product is free (that always helps), or because IBM is itself such an established vendor. Whatever the case, consider this vote as a huge upset.
(Note: IBM Lotus Symphony is available for [free download] for Windows and Linux.When my friend purchased a new laptop that came pre-installed with Windows Vista, he was surprised to see that Microsoft Office was not included. I pointed him to Lotus Symphony, and he is running great with his existing Word, Powerpoint and Excel documents! I use Lotus Symphony on both Windows and Linux, and IBM plans to make a version available for Mac OS X-- when that happens, I have my Mac Mini G4 waiting to try it out.)
IBM Wireless Software for Business Intelligence (BI) on the go
For most of 2007, IBM Cognos 8 Go! Mobile software supported only Blackberry units. At the end of last year, Cognos upgraded its wireless business intelligence software – which delivers business reports to on-the-go staffers – to support handhelds that run Windows Mobile OS. Naturally, this expanded the company’s user base, and likely helped Cognos 8 Go! Mobile win the Wireless Software category.
(If you have a RIM Blackberry handheld device, you can try out this[actual demo].)
Wow! That's a lot of awards. Congratulations to all my IBM colleagues who made this happen!
Wrapping up my week on the Feb 12 announcements, I will finish off talking about thenew Half-High (HH) LTO4 drives available for our TS3100 and TS3200 tape libraries.
Small and medium sized business (SMB) clients are looking for small, affordable tapesystems. Tape is inherently green, using orders of magnitude less energy than disk,and is very scalable by simply purchasing more tape cartridges.
When IBM first announced them, the TS3100 supported one drive with 24 cartridges,and the TS3200 (see picture at left) supported two drives and 48 cartridges. Unlike disk, that mentions RAWcapacity and then lowers it to indicate usable capacity in RAID configurations, tapeis just the opposite. LTO4 cartridges have 800 GB raw capacity, but with an average of 2:1compression, can hold a usable 1.6 TB of data. LTO4 also supports WORM cartridges fornon-erasable, non-rewriteable (NENR) types of data, and encryption capability.
As a follow-on to our HH LTO3 drives, IBM is the first major storage vendor to offerthe new HH LTO4 drives in entry-level automation, which directly attach via 3Gbps SAS connections to your host servers. The HH models allows you to have two drives in the TS3100, and four drives in the TS3200.
You can mix and match, LTO3 and LTO4. Why would anyone do that? Well, the Linear Tape Open [LTO]consortium --made up of technology provider companies IBM, HP and Quantum--decided to support N-2 generation read, and N-1 generation read/write. So, anLTO3 can read LTO1 cartridges, and read/write LTO2 and LTO3 cartridges. TheLTO4 can read LTO2 cartridges, and read/write LTO3 and LTO4 cartridges. For SMBcustomers that still have some LTO1 cartridges they might want to read some day,mixing LTO3 and LTO4 is a viable combination.
Of course, IBM still offers full-high (FH) versions of LTO3 and LTO4, which offer a bit faster acceleration, back-hitch and rewind times than their HH counterparts, and also offer additional attachment choices of LVD Ultra160 SCSIand 4 Gbps Fibre Channel as well.
So, for SMB customers that are simply using their tape for backup and archive,and probably not driving maximum rated speeds, having twice as many slowerdrives might be just the right fit.
Today, I'll cover the announcements related to our IBM System Storage N series disk systems, which ties inwith Valentines Day theme nicely. The phrase we use for "unified storage" is that N series allows you to "share the closet, not necessarily the clothes". Couples recognize the value of a shared closet over having one closet for just the man's clothes, and a separate closet for just the woman's clothes. (For some couples, the man's closet would be terribly under utilized!). By analogy, the N series allows you to share one solution for LUNs that can be accessed via FCP or iSCSI protocols, and NAS file systems that can be accessed via NFS and CIFS protocols. In most data centers, Windows and UNIX applications are about as likely to share files as men and women are to wear each other's clothes, so the analogy is in tact.
Let's take a look at what got announced:
N7700 and N7900
There are actually [eight new high-end N series] models. the N7900 has 4 processors and 32GB of cache. The N7700 has 2 processors and 16GB cache. Each has two appliance models (A11 single node and A21 dual node) and two gateway models (G11 single node and G21 dual node).
The appliance models support both FC and SATA disk. The N7900 A models support a maximum of 1176 drives; the N7700 A models supports 840 drives. The gateway models provide FCP, iSCSI and NAS host access through external disk attachment. The N7900 gateway models support 1176 LUNs on external disk systems; the N7700 gateway models support 840 external LUNs.
N series now supports 1 TB SATA disk
The [EXN1000 expansion drawer] can now have up to fourteen 1TB SATA drives. This is in addition to previousannouncements supporting 500GB and 750GB drive capacities. These drawer support the entire N series line.
With 1 TB drives, the N7900 now supports up to 1176 TB of raw capacity, which is over 1PB of usabledata in 12+2P RAID-DP mode. This is greater than the internal disk capacity limits of current IBM DS8000, EMC DMX andHDS USP-V models.
At the low end, both the N3300 and N3600 now support 500GB, 750GB and 1TB SATA drives in addition to the SASdrives they supported.
SnapManager for Microsoft SharePoint
There is a new SnapManager in town. This one is for Microsoft SharePoint data. See the announcementfor the [N3300 and N3600] for details.
On Jan 24, IBM signed agreements with [Ingram Micro, Tech Data, and Synnex], to distribute the N Series products and work with IBM to recruit new solution providers to the line. These three are all well-respected world-class distribution providers, so weare glad to have increased our partnership with them on this.
Yesterday, I promised I would cover other products from the Feb 12 announcement. Today I will focus on the IBM SAN768B director. Some people are confused on the differences between switchesand directors. I find there are three key differences:
Directors are designed to be 24x7 operation, highly available with no single points of failure or repair. Generally, all components in directors are redundant and hot-swappable, including Control Processors. In switches, some components are redundant and hot-swappable, such as fans and power supplies), but not the “motherboard” or controller. Often you have to take down a switch to make firmware or major hardware changes or upgrades.
Directors are designed to take in "blades" with different features, port counts, or protocol capabilities. You can add or remove blades while the system is up and running. Switches have a fixed number of ports. (A Small Form-factor Pluggable optical transceiver [SFP] is the component that turns electric pulses into light pulses (and visa versa). You plug the SFP into the switch, and then the fiber optic cable is plugged into the SFP).
With switches, you often start with a base number of active ports, and then can enable the rest of the ports as you need them.
Directors have hundreds of ports. Switches tend to have 64 ports or less.
Last year, Brocade acquired McDATA. Both were OEMs for IBM, and IBM distinguished that in the naming convention. The IBM SAN***B name was used to denote products manufactured for IBM by Brocade, and a SAN***M name was used to denote products manufactured by McDATA.
At that time, Brocade and McDATA equipment did not mix very well on the same fabric, so IBM retained the naming convention so that you as a customer knew what it worked with.
Brocade now has released with new levels of both operating systems--Brocade's FOS and McDATA's EOS--and their respective fabric managers--Brocade Fabric Manager (FM) and McDATA's Enterprise Fabric Connectivity Manager (EFCM)--so that they have full interoperability.
Brocade's goal is to enhance EFCM to be a common software management platform for all of their products going forward.
IBM used the maximum port count in the name to provide some clue as to the size of the switch or director. The SAN16B-2 or the SAN32B-3 are switches that have a maximum of 16 and 32 ports. The SAN256B supports a maximumeight blades of your choosing.Two different types were supported for FC ports, a 16-port blade and a 32-port blade.If all eight were 32-port blades then the maximum was 256 ports, hence the name. But then Brocade began offering 48-port blades. Should IBM change the name? No, it decided to leave itthe SAN256B even though it can now have a maximum of 384 ports.
Not to confuse anyone, the SAN768B also has a maximum of 384 ports, in the same 14U dimensions, but with a special twist. Normally to connect two directors together you use up ports from each, in what are called "inter-switch links" (ISL).These are ports you are taking away from availability from the servers and storage controllers. The SAN768Boffers a new alternative called "inter-chassis links". Each SAN768B has two processing blades, and each has two ICL ports, so with just four two-meter (2m) cables, you get the equivalent of 128 FC 8 Gbps ISL links without using 128 individual ports on each side. That is like giving you 256 ports back for use with servers and storage!
Since IBM directors require 240 volt power, IBM TotalStorage SAN Cabinet C36 include power distribution units (PDUs). PDUs are just glorified power strips, but a new intelligent PDU (iPDU) option introduces additional intelligence to monitor energy consumption for customers looking to measure, and perhaps charge back, energy consumption to the rest of the business. You can stack two SAN768B in one cabinet, one on top of the other, and connected via ICLs, it wouldlook like one huge 768-port backbone.
As a backbone for your data center, the SAN768B is positioned for two emerging technologies:
8 Gbps Fibre Channel (FC)
The SAN768B is powerful enough to have 32-port blades run full speed on all ports off-blade without oversubscription. Oversubscription is an emotional topic.
Normally, blades (like switches) can handle all traffic at full speed without delays provided the in-bound and out-bound ports involved are all on the same blade. In a director, however, if you need to communicate from a port on one blade to a port on a different blade, it is possible that off-blade traffic might be constrained or delayed in its transit across the backplane.
On the SAN768B, both the 16-port and 32-port blades can run at full 8 Gbps speed, and the 48-port is exposed to oversubscription only if you have more than 32-ports running at full 8 Gbps transferring data off-blade concurrently.
The new 8 Gbps SFPs support auto-negotiation at N-1 and N-2 generation link speeds. This means that they will automatically slow down when communicating with 4Gpbs and 2 Gbps devices, but they cannot communicate with 1 Gbps devices. If you are still using 1 Gbps devices in your data center, you will need to use 4 Gbps SFPs (which also support 2 Gbps and 1 Gbps link speeds) to communicate with those older devices.
Basically, this new technology enables transport of Fibre Channel packets over 10 Gbps Ethernet links. This 10 Gbps Ethernet can also be used to carry traditional iSCSI and TCP/IP traffic. FCoE introduces new extensions to provide Fibre Channel characteristics, like being lossless, and offering consistent performance. The ANSI T11 team is driving FCoE as an open standard, and at the moment it is not fully baked. I suggest you don't buy any FCoE equipment prematurely, as pre-standard devices or host bus adapters could get you burned later when the standard is finalized.
The idea is that FCoE blades can be installed in a SAN768B along with traditional FC blades, allowing routing of traffic between traditional FC and new FCoE ports. Those who have invested in FCIP for long distance replication will be able to continue using either FC or FCoE inputs.
One of the big drivers of FCoE is IBM BladeCenter. Currently, most BladeCenter blades support both Ethernet and FC connectivity and are connected to both Ethernet and FC switches on the back of each BladeCenter chassis. With FCoE, we have the potential to run both FC and IP traffic across simpler all-Ethernet blades, connecting through all-Ethernet switches on the backs of each chassis.
For more information on the IBM SAN768B, see the [IBM Press Release]. For more detailson Brocade's strategy, here is an 8-page white paper on their[Data Center Fabric] vision.
It's Tuesday, and you know what that means-- IBM makes its announcements.
Today, IBM announced a variety of storage offerings, but I am going to just focus this poston just the new DR550 models. The DR550 is the leading disk-and-tape solution forstoring non-erasable, non-rewriteable (NENR) data. This type of data, often called fixed-contentor compliance data, was previously writtento Write-Once-Read-Only (WORM) optical media. However, Optical technology has not advanced as fastas magnetic recording, so disk and tape have taken over this role. While there are still a fewlaws on the books that mandate "optical media" as the storage solution, new laws like SEC 17a-4and Sarbanes-Oxley (SOX) allow for NENR solutions based on magnetic disk or tape instead.
As we had done for the IBM SAN Volume Controller (SVC), the DR550 was based on "off the shelf"components. The File System Gateway (FSG) was based on System x server, the DR550 hardwarebased on System p server and DS4000 disk arrays, with "hardened" versions of the AIX,DS4000 Storage Manager and IBM Tivoli Storage Manager (TSM) that we renamed the IBM SystemStorage Archive Manager (SSAM).
The DR550 is Ethernet-based, so it can be used with all IBM server platforms, from System xand BladeCenter, to System i, and System p, and even System z mainframe customers, as wellas non-IBM platforms from Sun, HP and others. There are two ways to get data stored ontothe DR550:
Sending archive objects via the SSAM archive API. This is an API based on the XBSA open standardthat many applications have coded to.
Writing files via standard CIFS and NFS protocols through the File System Gateway (FSG), an optional priced feature that you can have incorporated into the DR550.
Generally, business applications like SAP or Microsoft Exchange don't do this directly, but ratheryou have an "archive management application" that acts as the go-between broker. IBM offers IBM Content Manager, IBM CommonStore for eMail (Exchange and Lotus Domino), and IBM CommonStore for SAP.IBM also recently acquired FileNet and Princeton Softech that provide additional support. Third partyproducts like Zantaz and Symantec KVS Enterprise Vault have also passed System Storage Provencertification for the DR550. These go-between applications understand the underlying storagestructure of their respective applications, and can apply policies to extract database rows, individualemails, or other attachments, as appropriate, and either move or copy them into the DR550.
The DR550 has built in support to move data from disk to tape, through policy-based automation behind the scenes. This is the key differentiator fromdisk-only solutions. Rather than filling up an EMC Centera, and watching it sit there idle burning energyfor five to seven years, or however long you are required to keep the data, you can instead use the disk for the most recent months worth of data on a DR550. The DR550 attaches to tapedrives or libraries, not just IBM TS1120 or LTO based models, but hundreds of systems from other vendorsas well. You can combine this with either rewriteable or WORM tape cartridge media, depending on yourcircumstances. This can be directly cabled, or through a SAN fabric environment. Storing the bulk ofthis rarely-referenced data on tape makes the DR550 substantially more affordable and more green thandisk-only alternatives.
Let's take a look at the specific models:
IBM System Storage DR550 DR1
The DR1 machine-type-model replaces the "DR550 Express" for small and medium size business workloads. This is a singleSystem p server with anywhere from 1 to 36 TB of raw disk capacity in a nice lockable 25U cabinet (see picture at left). On the original DR550 Express, the 25U cabinet was optional, but so many people opted for it, that wemade it standard feature. You can add the File System Gateway, which is a System x running Linuxwith NFS and CIFS protocols converted to SSAM API calls.
IBM System Storage DR550 DR2
The DR2 machine-type-model replaces the larger "DR550" for enterprise workloads. This can be either a single or dual node System p configuration, anywhere from 6 to 168 TB in raw disk capacity, in a lockable 36U cabinet. This also allows for an optional File System Gateway, and in the case of thedual node configuration, you can have two System p servers, and two System x servers with two Ethernetand two SAN switches for complete redundancy.
Common Information Model (CIM) and SMI-S interfaces have been added so that IBM Director can providea "single pane of glass" to manage all of the components of the DR550.
The system is based on high-capacity 750GB SATA drives, installed in half-drawer (eight drives, 6 TB)and full-drawer (16 drives, 12 TB) increments. Your choices will be 7+P RAID5 or 6+P+Q RAID6.Here is an Intel article that explains [RAID6 P+Q].In the future, as new disk technologies are introduced, the DR550 supports moving the disk datafrom old to new seamlessly, without disrupting the data retention policies enforcement.
For more information, here is a [6-page brochure] thathas specifications for both the DR1 and DR2 models.
Rich Bourdeau has written a nice article on InfoStor titled [Software as a Service (SaaS) meets Storage]. Last year, IBM acquired Arsenal Digital, and he mentions both in this article.It is interesting how this has evolved over the years.
Rent warehouse space for tapes
I remember when various companies offered remote storage for tapes. These would be temperature and humidity-controlledrooms, with access lists on who could bring tapes in, who could take tapes out, and so on. In the event of thedisaster, someone would collect the appropriate tapes and take them to a recovery site location.
Rent online/nearline storage from a Storage Service Provider (SSP)
SSPs rented storage space on disk, or provided automated tape libraries that could be written to. With tapes being ejected and stored in temperature/humidity-controlled vaults. Electronic vaulting eliminates a lot of theissues with cartridge handling and transportation, is more secure, and faster. Rented disk space, based on a Gigabytes-per-month rate, could be used for whatever the customer wanted. If these were for backups or archive,then the customer has to have their own software, to do their own processing at their own location, sending the data to the remote storage as appropriate, and manage their own administration.
Backup-as-a-Service and Archive-as-a-Service
We are now seeing the SaaS model applied to mundane and routine storage management tasks. New providers can offerthe software to send backups, the disk to write them to, and as needed the tape libraries and cartridges to rollover when the disk space is full. Disk capacity can be sized so that the most recent backups are on immediately accessible for fast recovery.
The same concept can be applied to archives. The key difference between a backup and an archive is that backups areversion-based. You might keep three versions of a backup, the most recent, and two older copies, in case something is wrong with the most recent copy, you can go back to older copies. This could be from undetected corruption of the data itself, or problems with the disk or tape media. An archive, on the other hand, is time-based. You want this data to be kept for a specific period of time, based on an event or fixed period of years.
Since BaaS and AaaS providers know what the data is, have some idea of the policies and usage patterns will be, can then optimize a storage solution that best meets service level agreements.
Many people have asked me if there was any logic with the IBM naming convention of IBM Systems branded servers. Here's your quick and easy cheat sheet:
System x -- "x" for cross-platform architecture. Technologies from our mainframe and UNIX servers were brought into chips that sit next to the Intel or AMD processors to provide a more reliable x86 server experience. For example, some models have a POWER processor-based Remote Supervisor Adapter (RSA).
System p -- "p" for POWER architecture.
System z -- "z" for Zero-downtime, zero-exposures. Our lawyers prefer "near-zero", but this is about as close as you get to ["six-nines" availability] in our industry, with the highest level of security and encryption, no other vendor comes close, so you get the idea.
But what about the "i" for System i? Officially, it stands for "Integrated" in that it could integrate different applications running on different operating systems onto a [COMMON] platform. Options were available to insert Intel-based processor cards that ran Windows, or attach special cables that allowed separate System x servers running Windows to attach to a System i. Both allowed Windows applications to share the internal LAN and SAN inside the System i machine. Later, IBM allowed [AIX on System i] and [Linux on Power] operating systems to run as well.
From a storage perspective, we often joked that the "i" stood for "island", as most System i machines used internal disk, or attached externally to only a fewselected models of disk from IBM and EMC that had special support for i5/OS using a special, non-standard 520-byte disk block size. This meant only our popular IBM System Storage DS6000 and DS8000 series disk systems were available. This block size requirement only applies to disk. For tape, i5/OS supports both IBM TS1120 and LTO tape systems. For the most part,System i machines stood separate from the mainframe, and the rest of the Linux, UNIX and Windows distributed serverson the data center floor.
Often, when I am talking to customers, they ask when will product xyz be supported on System z or System i?I explained that IBM's strategy is not to make all storage devices connect via ESCON/FICON or support non-standard block sizes, but rather to get the servers to use standard 512-byte block size, Fibre Channel and other standard protocols.(The old adage applies: If you can't get Mohamed to move to the mountain, get the mountain to move to Mohamed).
On the System z mainframe, we are 60 percent there, allowing three of the five operating systems (z/VM, z/VSE and Linux) to access FCP-based disk and tape devices. (Four out of six if you include [OpenSolaris for the mainframe])But what about System i? As the characters on the popular television show [LOST] would say: It's time to get off the island!
Last week, IBM announced the new [i5/OS V6R1 operating system] with features that will greatly improve the use of external storage on this platform. Check this out:
POWER6-based System i 570 model server
Our latest, most powerful POWER processor brought to the System i platform. The 570 model will be the first in the System i family of servers to make use of new processing technology, using up to 16 (sixteen!) POWER6 processors (running at 4.7GHZ) in each machine.The advantage of the new processors is the increased commercial processing workload (CPW) rating, 31 percent greater than the POWER5+ version and 72 percent greater than the POWER5 version. CPW is the "MIPS" or "TeraFlops" rating for comparing System i servers.Here is the[Announcement Letter].
Fibre Channel Adapter for System i hardware
That's right, these are [Smart IOAs], so an I/O Processor (IOP) is no longer required! You can even boot the Initial Program Load (IPL) direclty from SAN-attached tape.This brings System i to the 21st century for Business Continuity options.
Virtual I/O Server (VIOS)
[VirtualI/O Server] has been around for System p machines, but now available on System i as well. This allows multiplelogical partitions (LPARs) to access resources like Ethernet cards and FCP host bus adapters. In the case of storage, the VIOS handles the 520-byte to 512-byte conversion, so that i5/OS systems can now read and write to standard FCP devices like the IBM System Storage DS4800 and DS4700 disk systems.
IBM System Storage DS4000 series
Initially, we have certified DS4700 and DS4800 disk systems to work with i5/OS, but more devices are in plan.This means that you can now share your DS4700 between i5/OS and your other Linux, UNIX and Windowsservers, take advantage of a mix of FC and SATA disk capacities, RAID6 protection, and so on.
To call [IBM PowerVM] the "VMware for the POWER architecture" would not do it quite justice. In combination with VIOS, IBM PowerVM is able to run a variety of AIX, Linux and i5/OS guest images.The "Live Partition Mobility" feature allows you to easily move guest images from one system to another, while they are running, just like VMotion for x86 machines.
And while we are on the topic of x86, PowerVM is also able to represent a Linux-x86 emulation base to run x86-compiled applications. While many Linux applications could be re-complied from source code for the POWER architecture "as is", others required perhaps 1-2 percent modification to port them over, and that was too much for some software development houses. Now, we can run most x86-compiled Linux application binaries in their original form on POWER architecture servers.
BladeCenter JS22 Express
The POWER6-based [JS22 Express blade] can run i5/OS, taking advantage of PowerVM and VIOS to access all of the BladeCenterresources. The BladeCenter lets you mix and match POWER and x86-based blades in the same chassis, providing theultimate in flexibility.
While many are just becoming familiar with the end-user interfaces of Web 2.0, from blogs and wikis to FaceBook and FlickR, fewer may be familiar with the "information infrastructure" of servers and storagebehind the scenes.
Last year, I bought an XO laptop under the One Laptop Per Child [OLPC] foundation's Give-1-Get-1 program and posted my impressions on this blog. One in particular, my post[Printingon XO laptop with CUPS and LPR] showed how to print from the XO laptop over to a network-attached printer.This caught the attention of the OLPC development team, who asked me tohelp them with another project as a volunteer. Before accepting, I had to learn what skills they were really looking for, especially since I do notconsider myself an expert in neither printing nor networking.
(Unlike a regular 9-to-5 job where most people just try to look busy for eight hours a day, doingvolunteer work means being ready to ["roll up your sleeves"] and actuallyaccomplish something. This applies to any kind of volunteer work, from hammering nails for [Habitat for Humanity] to sorting cans at the [Community Food Bank].Best Buy uses the phrase "Results Oriented Work Environment" [ROWE] to describetheir latest program, modeled in part after the mobile workforce policies of Web2.0-enlightened companiesIBM and Sun, but that is perhaps a topic for another blog post!)
Apparently, to support a school full of students with XO laptops, it would be nice to have a few serversthat provide support to manage the class lesson plans, make reading materials and other content available,and keep track of results. What they need is an "information infrastructure"! They decided on two specific servers:
School Server -- this would run a popular class management system called [Moodle]
Library Server -- a server for a digital library collection, based on Fedora Commons[16-minute video]
In keeping with OLPC philosophy to use free and open source software[FOSS], both servers are based on the [LAMP] platform. LAMP is an acronym for thecombined software bundle of Linux, Apache, MySQL and a Programming language like PHP. The "XS" team working onthe school server wanted me to build a LAMP server and install Moodle to help test the configuration, determinewhat other software is required, and perhaps develop a backup/recovery scenario. Basically, they needed someone with Linux skills to put some hardware and software together.
(I am no stranger to Linux. Back in the 1990s, I was part of the Linux for S/390 team, led the effort to createthe infamous "compatible disk layout" (CDL) that allows z/OS to access ESCON and FICON-attached Linux volumes,took my LPI certification exam, and led a team to validate FCP drivers for our disk and tape storage systems. For an IBMer to volunteer foran Open Source community project, you have to take an "open source" class and get management approval to reviewfor any possible "conflicts of interest". I got this all taken care of, and accepted to help the XS team.)
Building a test environment is similar to baking a cake. You have a recipe, utensils, and ingredients. Here'sa bit of description of each of the ingredients:
Like Windows, the Linux operating system comes in different flavors to run on handhelds, desktops and servers. For servers, IBM tends to focus on Red Hat Enterprise Linux (RHEL) and SUSE Linux Eneterprise Server (SLES). However, the XS team decidedinstead to use [Fedora 7], a community-supported version from Red Hat. Earlier versions of Fedora were known as "Fedora Core", but apparently with version 7, the word "Core" has been dropped. Fedora 7 can be used in either desktop or server mode.
[Apache] is web server software, and half of all web servers on the internet use it. It competes head-on against Micorosofts Internet Information Services (IIS) serverprovided in Windows 2003. The Apache name is partly from thefact that its origins were "a patchy" variant of the NCSA HTTPd 1.3 codebase. Thepopular [IBM HTTP Server] is poweredby Apache, with added support to the rest of the IBM WebSphere software portfolio. The XS team chose Apache v2as the web server platform.
[MySQL] is a relational database management system (RDBMS) software, similar to commercial products like IBM DB2 Universal Database, Oracle DB, or Microsoft SQL Server. The SQL stands for Structured Query Language, developed by IBM in the early 1970s as a standard languageto update and query database tables. MySQL comes in two flavors, MySQL Enterprise for commercial use, and MySQLCommunity, which is community-supported. There are over 10 million instances of MySQL running websites on the internet, which helps explain why Sun Microsystems agreed to acquire MySQL AB company last month.The XS team decided on MySQL 5.0 as the database platform.
To make HTML pages dynamic, including the possibility to add or query database contents, requires programming.A variety of web scripting languages were developed, all starting with the letter "P" to claim to be the programming part of the LAMP platform, including [PHP], Perl, and Python. Later, new programming language frameworks have been developed that do not start with the letter "P", like [Ruby on Rails]. PHP is short for PHP: Hypertext Preprocessor which explains that it pre-processes HTML during web serving,looking for special tags indicating PHP code, allowing programming logic to insert HTML content, such as information extracted from a database.While Python is the language that runs the Sugar interface on the XO laptops, the XS team decided onPHP v5 as the programming language for the server.
As for utensils, you only need a few utilities
A simple text editor: I go old-school and use the classic "vi" (to learn this editor, see the["Cheat Sheet" method] on IBM Developerworks)
secure socket shell (SSH): this allows you to access one server from another
browser access to the internet: when you encounter problems, get error messages, or whatever, it pays to know how to search for things with Google
As for a recipe, the Moodle website spells out some unique details and parameters. For the base LAMP platform,I chose to follow the book [Fedora 7 Unleashed] that has specific chapters on setting up SSH, Apache, MySQL, PHP, Squid and so on. The resultingconfiguration looks like this:
Here were the sequence of events:
I took an old PC that I wasn't using anymore, backed up the Windows system, and installed Linux on top. Thebook above had a Fedora 7 DVD on the back jacket, but I used the [OLPC LiveCD] that had some values pre-configured.
Set the IP address static. I set mine to 192.168.0.77 which nobody sees except my other systems.
My school server is "headless" which means it does not have its own keyboard, video or mouse. It also runs only to Linux run level 3, command line interface only, no graphics.I was able toshare using a KVM switch], but this meant having to remember something on one screen while I was switching over to the other. My Windows XP system has mybrowser connection to the internet to follow instructions or read error messages, so I need that up all thetime. To get around this, on my Windows XP system,I generated SSH public and private keys, copied the public key over to my new Linux system, and used [OpenSSH for Windows] to connect over. Now, on one screen,I have my Windows XP Firefox browser, and a separate command line window that is accessing my Linux schoolserver.
With SSH up and running, I can now use "vi" to edit files, and issue commands to install or activatethe remaining software. First up, Apache. I got this working, and from Windows XP, verified that going to"http://192.168.0.77" showed the Apache test screen.
I installed PHP, and tested it with a simple short index.php file.
I installed MySQL, setup the base "installation databases", and created a test database. Here is whereyou might want to set a password for the MySQL root user, but I chose to do that later for now.
I installed Moodle. It was smart enough to check that Apache, PHP, and MySQL were operational, andapparently I missed a few special "PHP" modules that had to be linked in. I was able to find them, downloadthem, and get them installed.
I brought up Moodle, created a "class category" of SCIENCE and a new class "Chemistry 101", and it allworked.
I also activated Squid, which is a web proxy cache server that stores web pages for faster access.
Another idea was to activate Samba, to provide CIFS file and print sharing, but I decided to put this off.
I got all of this done last Saturday, start to finish. Now the fun begins. We are going to run throughsome tests, document the procedures, and try to get a system up and running in a remote school in Nepal. Fornow, I have only one XO laptop to simulate what the student sees, and one laptop that can represent eithera teacher's Windows-based laptop, or run QEMU and emulate a second XO laptop.For tuning, I might go through the procedures mentioned on IBM Developerworks "Tuning LAMP"[Part 1, Part 2,Part 3].
For those in the server or storage industry that need to understand Web 2.0 information infrastructure better,building a LAMP server like this can be quite helpful.
Are you covering the business impact of the internet failure across Asia, the Middle East and North Africa? The outage has brought business in those regions to a standstill. This disaster shines a direct spotlight on the vulnerability of technology and serves as a reminder of the ever increasing importance of protecting business critical information.
Disaster recovery needs to be a critical element of every technology plan. We don’t yet know the financial impact of this wide spread internet failure, but the companies with disaster recovery plans in place, were likely able to failover their entire systems to servers based in other regions of the world.
When I first heard of this outage, I am thinking, so a few million people don't have access to FaceBook and YouTube, what's the big deal? We in the U.S.A. are in the middle of a [Hollywood writer's strike] and don't have fresh new television sitcoms to watch! Yahoo News relays the typical government's response:[Egypt asks to stop film, MP3 downloads during Internet outage], presumably so that real business can take priority over what little bandwidth is still operational. Fellow IBM blogger "Turbo" Todd Watson pokes fun at this, in his post[Could Someone Please Get King Tutankhamun On The Phone?].Like us suffering here in America, perhaps our brothers and sisters in Egypt and India may getre-acquainted with the joys of reading books.
However, the [Internet Traffic Report-Asia] shows how this impacted various locations including: Shanghai, Mumbai, Tokyo, Tehran, and Singapore. In some cases, you have big delays in IP traffic, in other cases, complete packet loss, depending on where each country lies on the["axis of evil"].This is not something just affecting a few isolated areas, the impact is indeed worldwide. This would be a goodtime to talk about how computer signals are actually sent.
DWDM takes up to 80 independent signals, converts each to a different color of light, and sends all the colors down a single strand of glass fiber. At the receiving end, the colors are split off by a prism,and each color is converted back to its original electrical signal.
Similar DWDM, but only eight signals are sent over the glass fiber. This is generally cheaper, becauseyou don't need highly tuned lasers.
Wikipedia has a good article on [Submarine Communications Cable],including a discussion on how repairs are made when they get damaged or broken.It is important to remember that lost connectivity doesn't mean lost data, just lack of access to the data. Thedata is still there, you just can't get to it right now. For some businesses, that could be disruptive to actualoperations. In other cases, it means that backups or disk mirroring is suspended, so that you only have yourlocal copies of data until connectivity is resumed.
When two cables in the Mediterranean were severed last week, it was put down to a mishap with a stray anchor.
Now a third cable has been cut, this time near Dubai. That, along with new evidence that ships' anchors are not to blame, has sparked theories about more sinister forces that could be at work.
For all the power of modern computing and satellites, most of the world's communications still rely on submarine cables to cross oceans.
It gets weirder. In his blog Rough Type, Nick Carr's[Who Cut the Cables?] reportsnow a fourth cable has been cut, in a different location than the other two cable locations. If the people cuttingthe cables are looking to see how much impact this would have, they will probably be disappointed. Nick Carrrelates how resilient the whole infrastructure turned out to be:
Though India initially lost as much as half of its Internet capacity on Wednesday, traffic was quickly rerouted and by the weekend the country was reported to have regained 90% of its usual capacity. The outage also reveals that the effects of such outages are anything but neutral; they vary widely depending on the size and resources of the user.
Outsourcing firms, such as Infosys and Wipro, and US companies with significant back-office and research and development operations in India, such as IBM and Intel, said they were still trying to asses how their operations had been impacted, if at all.
Whether it is man-made or natural disaster, every business should have a business continuity plan. If you don't have one, or haven't evaluated it in a while, perhaps now is a good time to do that. IBM can help.
According to Gartner data (from 2005!), host-based storage accounts for 34 percent of the overall market for external storage, with the remaining 66 percent going to "fabric-attached" (network) storage, expect this share to grow from 66 percent to 77 percent by 2007.What is the current reality? SAN vs. NAS, FC vs iSCSI?
IBM subscribes to a lot of data from different analysts, they all have their methods for collecting this data, from taking surveys of customers to reviewing financial results of each vendor. While theymight not agree entirely, there are some common threads that lead one to believe they represent "reality". Hereare some numbers from an IDC December 2007 report:
Worldwide Disk Storage
While the 32/68 split is similar to the 34/66 split you mentioned before, you can see that external growth isgrowing faster, so internal host-based storage will drop to 25 percent by 2011, with external storage growing to 75 percent, very close to the 77 predicted. Looking at just the externaldisk storage, there are basically three kinds: DAS (direct cable attachment), NAS (file level protocols suchas NFS, CIFS, HTTP and FTP), and SAN (block-level protocols like FC, iSCSI, ESCON and FICON):
Worldwide External Disk Storage
At these rates, fabric-attached (SAN and NAS) will continue to dominate the storage landscape.Looking more closely now at the block-oriented protocols.
Worldwide External Disk Storage
Fibre Channel (FC)
At these rates, iSCSI will overtake FC by 2011. IBM System Storage N series, DS3300 and XIV Nextraall support iSCSI attachment.
Jon Toigo over at DrunkenData offers some additional data from ex-STKer:[Fred Moore Outlook on Storage 2008]. I met Fredat a conference. He had left STK back in 1998, and started his own company called Horison. NeitherJon nor Fred cite the sources of his statistics, but the following comment leads me to assume hehasn't been paying attention closely to the tape market:
With the demise of STK, who will be the leader in the tape industry?
Depending on how old you are, you might remember exactly where you were when a significant eventoccurred, for example the[Space Shuttle Challenger]explosion. For many IBMers, it was the day our friends at Sun Microsystems announced they were [puttingour lead tape competitor out of its misery]. I was in New York that day, but there was still someconfetti on the floor in the halls of the IBM Tucson lab when I got home a few days later. IBM hasbeen the number one market share leader in tape for over the past four years.
Last July, IBM and EMC traded blog postings over SPC-1 benchmark results. Fellow EMC bloggerChuck Hollis wrote his post [Does Anyone Take The SPC Seriously?]. Here is an excerpt:
I think most storage users have figured this out. We've never done an SPC test, and probably will never do one. Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it.
I responded with [Getting Under EMC Skin], and then followed up with a series explaining IBM SVC and SPC benchmarks here:
So what is the good news?Yesterday, our friends at NetApp took up Chuck's challenge and posted results on their FAS3040 as well as their EMC CLARiiON devices. IBM sells the FAS3040 under the name IBM System Storage N5300 disk system. Knowing that NetApp maintains excellent performance when it is doing point-in-time copies, NetApp ran both with and without on both boxes. I include DS4700 and DS4800 as well for comparison purposes, but only have them without FlashCopy running.
NetApp FAS3040 (IBM N5300)
NetApp FAS3040 (IBM N5300)
EMC CLARiiON CX3-40
IBM DS4700 Express
EMC CLARiiON CX3-40
One would expect some performance degradation with a box running point-in-time copies at the same time it is reading and writing data, but NetApp/IBM N5300 does not degrade by much, but EMC's drops a significant amount.
So what is the bad news? Last October, I welcomed HDS USP-V to the [Super High-End Club], but now we need to invite Texas Memory Systems as well.In 2006, I posted [Hybrid, Solid State and the future of RAID], and poked fun at Texas Memory Systems using the slogan "World's Fastest Storage", which at the time that honor belonged to IBM SAN Volume Controller instead.The VP of Texas Memory Systems, Woody Hutsell, explained the only reason their solid-state disk system, RAMSAN-320, didn't have faster results is that they didn't have the fastest IBM server to run against it. It may not surprise you that nearly everyone's SPC benchmarks use IBM servers because IBM has the fastest servers as well. I didn't have a million-dollar System p UNIX server to send Woody for this, but it looks like they have finally gotten one, and a new RAMSAN-400 device, as they have posted their latest results.
Texas Memory Systems RAMSAN-400
IBM SAN Volume Controller 4.2
EMC doesn't publish numbers for their Symmetrix box, despite their announcement of faster SSD drives. They claim that SSD drives make their overall disk system performance faster, but without SPC benchmarks, we will never know. If you have a Symmetrix, this YouTube video may help you decide where it belongs:
IBM came out with their latest "5 in 5". These are five predictions for technologies that will havean impact over the next five years, summarized on 5 pages. Before I give my take on this year's set,here is a quick recap of[Last Year's 5 in 5]:
3-D representations of the human body to improve health care
This prediction is based on the idea that most medical mistakes result from lack of informationabout the patient. A 3-D avatar of the patient would allow the doctor to click on the section ofthe body, and this would trigger retrieval of patient records, relevant X-rays, MRI images, and so on.For example, IBM System Storage Grid Medical Archive Solution (GMAS) provides the storage that wouldallow any doctor to access these records, even if the image was taken at a different facility.
Unfortunately, this prediction only applies to patients who can actually afford to see a doctor. Apparently,no amount of technology, no matter how cool it is, can convince governments to make health care somethingeveryone has access to. Michael Moore has done a good job explaining this in his film documentary [Sicko].
Digital passport for food
Using RFID tags and second generation barcodes, you will have access to details of a food's origin,transportation conditions, and impact to the environment. Much of this information is already gathered,just not stored in a database accessible to the consumer.
Last year, the term "locavore" was the2007 Word of the Year for the Oxford American Dictionary, referring to people who limit what they eatto food produced within a certain radius, from family farms and locally-owned businesses.Here is an excerpt from a [Locavores] website:
Our food now travels an average of 1,500 miles before ending up on our plates. This globalization of the food supply has serious consequences for the environment, our health, our communities and our tastebuds.
Certainly, I am all for selling storage capacity to the food industry to help store vasts amount ofinformation for this, and certainly some people will be able to make smarter decisions based on thisinformation. This is not the first time this idea came up. The U.S. Food and Drug Administration introduced [nutrition labeling requirements] on thehope that people would choose more healthier foods. Despite this, people still opt for white bread, iceberg lettuce, and processed meats, so possibly having more information about where food comes from, and how it was transported, may not mean much to some consumers.
Technology to manage your own carbon footprint
"Smart energy" technologies allow you to walk the talk, by managing your own carbon footprint inyour home. For example, if you forgot to turn off the heat or air conditioner before leaving thehouse on your commute to work, your home would call your mobile phone, so that you can turn aroundand go back and correct that mistake. Better yet, IBM is working with others to provide web-enabledelectric meters that would allow you to turn off systems from work or cell phone browser.
Of course, such technology already exists for the data center. IBM Systems Director Active EnergyManager (AEM) allows you to monitor the actual usage of your servers and storage devices, and insome cases make adjustments to control energy consumption. This can feed into the IBM TivoliUsage and Accounting Manager software to incorporate energy usage as part of the charge-backcalculations. See the [IBM Press Release] formore details.
Cars that drive themselves
Not only will cars that drive themselves reduce the number of drunk-driving accidents, it canalso help reduce congestion in big cities, by routing traffic to different directions, based onGPS and presence-aware technologies. Stockholm (Sweden) has already reduced peak hour traffic by 20 percentusing this approach.
While I admire the concept, cars are perhaps the least energy-efficient mode of transportation.Often, a family can only afford a single vehicle, and it is purchased based on the worst-case scenario.A friend of mine has only two children, but a sever-person mini-van that gets only 17 MPG. Why suchan energy-inefficient vehicle? Because she occasionally drives her daughter and her friends tosoccer practice, and that represents the worst-case scenario, minimizing the parent/child ratio. Theother 99 percent of the time, she is driving by herself, or with one child, and consuming a lot ofgasoline in the process.
A better approach would be to find technology that connects airports, trains, buses and light rail forpublic transportation to greatly reduce the need to drive a car in the first place.
The idea that a family can have only one vehicle plays in the storage arena as well. Larger companiescan afford to have different storage for different workloads. The IBM System Storage DS8000 high-end disk system for their large OLTP anddatabase workloads, an XIV Nextra for their Web 2.0 storage needs, DR550 to hold their compliance data,and so on. Smaller companies are often tasked to find a single solution for all their needs, andfor them, IBM offers the IBM System Storage N series, providing a "unified storage" platform.
Increased dependence on cell phones
Before the cell phone, the last don't-leave-home-without-it technology most of us carried was the credit card. Now, IBM predicts that we will be even more dependent on our cell phones, becoming our banker, ticket broker, and shopping buddy.For example, you could use your cell phone to take a picture of a shirt at the mall, and it will then show you what youwould look like wearing that shirt, on a 3-D avatar representation of yourself, or perhaps your spouse, and getinformation on what discounts are available, or where else the shirt is being offered.
None of this example actually uses the "phone" part of the cell phone, however the cell phone is one device thatnearly everyone carries, so it becomes the development platform for all other technologies to be based on.
The common theme running through these is that it can be helpful to store more information than we do today,provided we make it accessible to the people who need it to make better decisions.
Last week, I got the following comment from Bob Swann:
I am looking for the IBM VM Poster or a picture of the IBM VM "Catch the Wave"
Do you know where I might find it?
Well, Bob, I made some phone calls. The company that published these posters no longer exists, butI found a coworker at the Poughkeepsie Briefing Center who still had the poster on his wall, and he was kind enough to take a picture of it for you.
VM: The Wave of the Future (click thumbnail at left to see larger image)
Some may recognize this as a [mash-up] using as a base the famous Japanese 10-inch by 15-inch block print[The Great Wave off Kanagawa] byartist [Katsushika Hokusai]. I had this as my laptop'swallpaper screen image until last year when I was presenting in Kuala Lumpur, Malaysia. I was told that it reminded people about the horrible tsunami caused by the [Indian Ocean earthquake] back in 2004.I was actually scheduled to fly the last week of December 2004 to Jakarta, Indonesia, but at the last minute ourclient team changed plans. I would have been on route over the Pacific ocean when the tsunami hit, and probably stranded over there for weeks or months until the airports re-opened.
The Wave theme was in part to honor the IBM users group called World Alliance VSE VM and Linux (WAVV) which is havingtheir next meeting [April 18-22, 2008] in Chattanooga, Tennessee. I presentedat this conference back in 1996 in Green Bay, Wisconsin, as part of the IBM Linux for S/390 team. It started onthe Sunday that Wisconsin switched their clocks for [DaylightSaving Time], and the few of us from Arizona or other places that don't both with this, all showed up forbreakfast an hour early.
When I was in Australia last year, I was told the wave that sports fans do, by raising their hands in coordinatedsequence, was called the [Mexican Wave]in most other countries. When I was there, Melbourne was trying to outlaw this practice at their cricket matches.
The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!)
I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below.
While EMC bloggers garnered media attention last year pointing out the faulty mathematics from HDS, an astute reader pointed me to EMC's own [DMX-4 specification sheet],updated for its 1TB SATA disk.I've chosen just the minimum and maximum number of drives RAID-6 data points for non-mainframe platforms:
In the first two rows, the numbers appear as expected. For example, 96 drives would be 12 sets of 6+2 RAID ranks, meaning 72 drives' worth of data, so nearly 36TB for 500GB drives, and nearly 72TB for 1TB drives. With 14+2 RAID-6, thenyou would have 84 drives' worth of data, so 42TB and 84TB respectively match expectations.
Where EMC appears miscalculating is having 20x more drives, as the numbers don't match up. For 1920 drives inRAID-6, you would expect 20x more usable capacity than the 96 drive configurations. For 6+2 configurations, one would expect 720TB and 1440TB respectively. For 14+2 configurations, one wouldexpect 840TB and 1680TB, respectively.
Perhaps EMC DMX-4 can't address more than 600TB for the entire system? Does EMC purposely limit the benefitsof these larger drives? It does question why someone might go from 500GB to 1TB drives, if the maximum configuration only gives about 40TB more capacity.Fellow IBM blogger Barry Whyte questioned the use of SATA in an expensive DMX-4 system, in his post[One Box Fits All - Or Does It], and now perhaps there are good reasons to question 1TB from a capacityperspective as well.
Today is Tuesday, a good day for announcements and good news!
This week I am in Guadalajara, Mexico, and the focus in Mexico is Small and Medium sized Business (SMB). SmallBusinessComputing.COM put out their [2008 Awards: The Absolute Best in Small Business], and IBM disk and server systems were recognized. Here is an excerpt:
Network Storage As companies expand, so does the data, and often at an alarming rate. Adding dedicated storage to your network can ease both system performance and efficiency woes, making your work life a bit easier.
This year, 42 percent our readers cast their lot with the [IBM System Storage DS3400]. The $6,495 system supports 12 hard disk drives for capacity of up to 3.6 terabytes a good match for tasks such as managing databases, e-mail and Web serving.
Last year's winner, NetApp, takes a very respectable runner-up slot for the NetApp Store Vault S300, a $3,000 storage appliance that offers security, scalability, data protection and simplified management.
Also, IBM's SMB departmental machine, the [System i515 Express] was named runner-up for servers.
This week I'm in beautiful Guadalajara, Mexico teaching at our[System Storage Portfolio Top Gun class].We have all of our various routes-to-market represented here, including our direct sales force, our technicalteams, our online IBM.COM website sales, as well as IBM Business Partners.Everyone is excited over last week's IBM announcement of [4Q07 and full year 2007 results], which includesdouble-digit growth in our IBM System Storage business, led by sales of our DS8000, SAN Volume Controller and Tapesystems. Obviously, as an IBM employee and stockholder, I am biased, so instead I thought I would provide someexcerpts from other bloggers and journalists.
But what was striking in the company’s conference call on Thursday afternoon was the unhedged optimism in its outlook for 2008, given the strong whiff of recession fear elsewhere.
The questions from Wall Street analysts in the conference call had a common theme. Why are you so comfortable about the 2008 outlook? Now, that might just be professional churlishness, since so many of them have been so wrong recently about I.B.M. Wall Street had understandably thought, for example, that I.B.M.’s sales to financial services companies — the technology giant’s largest single customer category — would suffer in the fourth quarter, given the way banks have been battered by the mortgage credit crunch.
But Mr. Loughridge said that revenue from financial services customers rose 11 percent in the fourth quarter, to $8 billion. The United States, he noted, accounts for only 25 percent of I.B.M.’s financial services business.
The other thing that seems apparent is how much I.B.M.’s long-term strategy of moving up to higher-profit businesses and increasingly relying on services and software is working. Its huge services business grew 17 percent to $14.9 billion in the quarter. After the currency benefit, the gain was 10 percent, but still impressive. Software sales rose 12 percent to $6.3 billion.
Looking at IBM's business segments, it can be seen that they offer far more coverage of the technology space that those of the typical tech company:
IBM is just so big and diversified that there is little comparison between it and most other tech companies. IBM is a member of an elite group of companies like Cisco Systems (CSCO), Microsoft (MSFT), Oracle (ORCL) or Hewlett-Packard (HPQ).
IBM's wide international coverage and deep technological capabilities dwarf those of most tech companies. Not only do they have sales organizations worldwide but they have developers, consultants, R&D workers and supply chain workers in each geographic region. Their product mix runs from custom software to packaged enterprise software, hardware (mainframes and servers), semiconductors, databases, middleware technology, etc., etc. There are few tech companies that even attempt to support that many kinds and variations of products.
As color on the fourth quarter earnings announcement, there are a couple of observations that I would like to make. The first one speaks to IBM's international prowess. The company indicated that growth in the Americas was only 5%. International sales were a primary driver of IBM's good results. As an insight on the difference between IBM and most other tech companies, it is clear that nowadays, a tech company that isn't adept at selling internationally is going to be in trouble.
Terrific performance in a terrific year - no doubt a result of its strong global model. IBM operates in 170 countries, with about 65% of its employees outside US and about 30% in Asia Pacific. For fiscal 2007, revenues from Americas grew 4% to $41.1 billion (42% of total revenue), [EMEA] grew 14% to $34.7 billion (35%of total revenue), and Asia-Pacific grew by 11% to $19.5 billion (19.7% of total revenue). IBM sees growth prospects not just in [BRIC] but also countries like Malaysia, Poland, South Africa, Peru, and Singapore.
Thus far 2008–all two weeks of it–hasn’t been a pretty for the tech industry. Worries about the economy prevail. And even companies that had relatively good things to say like Intel get clobbered. It’s ugly out there–unless you’re IBM.
I am sure there will be more write-ups and analyses on this over the next coming weeks, and others will probably waituntil more tech companies announce their results for comparison.
Fellow Blogger BarryB mentions "chunk size" in his post [Blinded by the light],as it relates to Symmetrix Virtual Provisioning capability. Here is an excerpt:
I mean, seriously, who else but someone who's already implemented thin provisioning would really understand the implications of "chunk" size enough to care?
For those of you who don't know what the heck "chunk size" means (now listen up you folks over at IBM who have yet to implement thin provisioning on your own storage products), a "chunk" is the term used (and I think even trademarked by 3PAR) to refer to the unit of actual storage capacity that is assigned to a thin device when it receives a write to a previously unallocated region of the device.
For reference, Hitachi USP-V uses I think a 42MB chunk, XIV NEXTRA is definitely 1MB, and 3PAR uses 16K or 256K (depending upon how you look at it).
Thin Provisioning currently offered in IBM System Storage N serieswas technically "implemented" by NetApp, and that the Thin Provisioning that will be offered in our IBM XIV Nextrasystems will have been acquired from XIV. Lest I remind you that many of EMC's products were developed by other companies first, then later acquired by EMC, so no need for you to throw rocks from your glass houses in Hopkington.
"Thin provisioning" was first introduced by StorageTek in the 1990's and sold by IBM under the name of RAMAC Virtual Array (RVA). An alternative approach is "Dynamic Volume Expansion" (DVE). Rather than giving the host application a huge 2TB LUN but actually only use 50GB for data, DVE was based on the idea that you only give out 50GB they need now, but could expand in place as more space was required. This was specifically designed to avoid the biggest problem with "Thin Provisioning" which back then was called "Net Capacity Load" on the IBM RVA, but today is now referred to as "over-subscription". It gave Storage Administrators greater control over their environment with no surprises.
In the same manner as Thin Provisioning, DVE requires a "chunk size" to work with. Let's take a look:
On the DS4000 series, we use the term "segment size", and indicate that the choice of a segment size can have some influence on performance in both IOPS and throughput. Smaller segment sizes increase the request rate (IOPS) by allowing multiple disk drives to respond to multiple requests. Large segment sizes increase the data transfer rate(Mbps) by allowing multiple disk drives to participate in one I/O request. The segment size does not actually change what is stored in cache, just what is stored on the disk itself.It turns out in practice there is no advantage in using smaller sizes with RAID 1; only in a few instances does this help with RAID-5 if you can writea full stripe at once to calculate parity on outgoing data. For most business workloads, 64KB or 128KB are recommended. DVE expands by the same number of segments across all disks in the RAID rank, so for example in a 12+P rank using 128KB segment sizes, the chunk size would be thirteen segments, about 1.6MB in size.
SAN Volume Controller
On the SAN Volume Controller, we call this "extent size" and allow it to be various values 64MB to 512MB. Initially,IBM only managed four million extents, so this table was used to explain the maximum amount that could be managedby an SVC system (up to 8 nodes) depending on extent size selected.
IBM thought that since we externalized "segment size" on the DS4000, we should do the same for the SANVolume Controller. As it turned out, SVC is so fast up in the cache, that we could not measure any noticeable performance difference based on extent size. We did have a few problems. First, clients who chose 16MB andthen grew beyond the 64TB maximum addressable discovered that perhaps they should have chosen something larger.Second, clients called in our help desk to ask what size to choose and how to determine the size that was rightfor them. Third, we allowed people to choose different extent sizes per managed disk group, but that preventsmovement or copies between groups. You can only copy between groups that use the same extent size. The generalrecommendation now is to specify 256MB size, and use that for all managed disk groups across the data center.
The latest SVC expanded maximum addressability to 8PB, still more than most people have today in their shops.
Getting smarter each time we introduce new function, we chose 1GB chunks for the DS8000. Based on a mainframebackground, most CKD volumes are 3GB, 9GB, or 27GB in size, and so 1GB chunks simplified this approach. Spreadingthese 1GB chunks across multiple RAID ranks greatly reduced hot-spots that afflict other RAID-based systems.(Rather than fix the problem by re-designing the architecture, EMC will offer to sell you software to help you manually move data around inside the Symmetrix after the hot-spot is identified)
Unlike EMC's virtual positioning, IBM DS8000 dynamic volume expansion does work on CKD volumes for our System z mainframe customers.
The trade-off in each case was between granularity and table space. Smaller chunks allow finer control on the exact amount allocated for a LUN or volume, but larger chunks reduced the number of chunks managed. With our advanced caching algorithms, changes in chunk size did not noticeably impact performance. It is best just to come up with a convenient size, and either configure it as fixed in the architecture, or externalize it as a parameter with a good default value.
Meanwhile, back at EMC, BarryB indicates that they haven't determined the "optimal" chunk size for their newfunction. They plan to run tests and experiments to determine which size offers the best performance, and thenmake that a fixed value configured into the DMX-4. I find this funny coming from the same EMC that won't participate in [standardized SPC benchmarks] because they feel that performance is a personal and private matter between a customer and their trusted storage vendor, that all workloads are different, and you get the idea. Here's another excerpt:
Back at the office, they've taking to calling these "chunks" Thin Device Extents (note the linkage back to EMC's mainframe roots), and the big secret about the actual Extent size is...(wait for it...w.a.i.t...for....it...)...the engineers haven't decided yet!
That's right...being the smart bunch they are, they have implemented Symmetrix Virtual Provisioning in a manner that allows the Extent size to be configured so that they can test the impact on performance and utilization of different sizes with different applications, file systems and databases. Of course, they will choose the optimal setting before the product ships, but until then, there will be a lot of modeling, simulation, and real-world testing to ensure the setting is "optimal."
Finally, BarryB wraps up this section poking fun at the chunk sizes chosen by other disk manufacturers. I don't knowwhy HDS chose 42MB for their chunk size, but it has a great[Hitchiker's Guide to the Galaxy]sound to it, answering the ultimate question to life, the universe and everything. Hitachi probably went to theirDeep Thought computer and asked how big should their "chunk size" be for their USP-V, and the computer said: 42.Makes sense to me.
I have to agree that anything smaller than 1MB is probably too small. Here's the last excerpt:
Now, many customers and analysts I've spoken to have in fact noted that Hitachi's "chunk" size is almost ridiculously large; others have suggested that 3PAR's chunks are so small as to create performance problems (I've seen data that supports that theory, by the way).
Well, here's the thing: the "right" chunk size is extremely dependent upon the internal architecture of the implementation, and the intersection of that ideal with the actual write distribution pattern of the host/application/file system/database.
So my suggestion to EMC is, please, please, please take as much time as you need to come up with the perfect"chunk size" for this, one that handles all workloads across a variety of operating systems and applications, from solid-state Flash drives to 1TB SATA disk. Take months or years, as long as it takes. The rest of the world is in no hurry, as thin provisioning or dynamic volume expansion is readily available on most other disk systems today.
Maybe if you ask HDS nicely, they might let you ask their computer.
This week was the 2008 MacWorld conference. I thought I would reflect on some of the storage related aspects of the products mentioned by Steve Jobsin his Keynote address.Many were updated version of products introduced last year's MacWorld. (In case you forgot whatthose were, here ismy post that covered [MacWorld 2007]).
(Disclaimer: IBM has a strong working relationship with Apple, and manufacturers technology used in someof Apple's products. I own both an Apple iPod as well as an Apple G4 Mac Mini. IBM supports its employees usingApple laptops instead of Windows-based ones for work, and IBM has developed software that runs on Apple's OS X.Apple is kind enough to extend its "employee discount prices" to IBM employees.)
In the first 90 days of its release, Apple sold 5 million copies, representing 19 percent of Mac users. I am stillone of the 81 percent still using 10.4 Tiger, the previous level. My Mac Mini is based on G4 POWER processor, and upgrading is on my [Someday/Maybe] list. I am not taking sides in the [OS X vs. Windows vs. Linux religious debate]; I use all three.
The key storage-related feature of Leopard is their backup software Time Machine, and Steve Jobs announceda companion product called Time Capsule that would serve as the external backup disk wirelessly, over 802.1nWi-Fi. For many households, backup is either never done, or done rarely, so any help to simplify and relieve theburden is welcome.
Time Capsule comes in 500GB and 1TB SATA disk capacities, which Steve Jobs called "server-grade". What about a 750GB model? Looks like Apple followed EMC'sexample and went straight to 1TB instead. After EMC failed to deliver 750GB drives in 2007 that they [promised back in July], EMC blogger Chuck Hollis explains in his post[Enterprise Storage Strikes Back!]:
So there's something in the EMC goodie bag as well for you -- the availability of the new 1TB disk drives you've been hearing about. We skipped the 750GB drive and went right to the 1TB drive.
Apple iPhone and iPod Touch
In the first 200 days, Apple has sold 4 million phones, and has garnered nearly 20 percent of the smart phone market share. New features include a GPS-like location feature that uses [triangulation] with cell phone towers and Wi-Fi hotspotsto determine where you are located.
I covered last year's introduction of the iPhone in my post on [Convergence].All of the features he presented were software updates to the existing 8GB and 16GB models. No new modelswith larger storage were introduced.
I am a T-mobile customer, so am out of luck until either (a) Apple unlocks their phones from the AT&T network, or(b) Apple signs an agreement with T-mobile in the USA. I reviewed the various hacks to unlock iPhones last year, but was not interested in losing official warranty or future software support.
The iPod Touch is an interesting alternative. It is basically an iPhone with the cell-phone features disabled, whichgives you Wi-Fi over the Safari browser, music, videos, and so on. Steve Jobs mentioned enhanced software updates for this as well. The iPod Touch comes in the same 8GB and 16GB sizes as the iPhone.
AppleTV and iTunes
Steve Jobs indicated that they have sold over 4 billion songs over iTunes, 125 million TV shows, and 7 million movies.He announced that now iTunes would allow for movie rentals, with the option to see them within 30 days, but once you started watching a movie, you have 24 hours to finish. I found it interesting that he said rentals were to reduce space on your hard drive, versus outright purchase of movie content.
In a rare concession, Steve admitted that the original AppleTV misunderstood the marketplace. The original AppleTV allowed you to view pictures and listen to music through your television, but people wanted to view movies. Thesoftware upgrade would allow this, using the iTunes rental model above, as well as watch video podcasts and over 50 million videos posted on YouTube.
Some television-related stats from [z/Journal] were quite timely. The older non-digital TVs could be usedwith the AppleTV and gaming systems like Nintendo Wii.
33 percent of U.S. households do not know what to do with (their older) TVs after digital switch (Feb 2009)
69 percent of Americans think PCs are more entertaining than TV
Rather than try to fight peer-to-peer website piracy, Apple cleverly decided to compete head-to-head against it. This iswell summarized in Matt Mason's 6-minute video [The Pirate's Dilemma]. Eleven major movie studios are on board with Apple's movie rental plans, making thousands of movietitles available for this, with hundreds in High Definition (HD).
I personally have a Tivo, connected wirelessly to a regular non-HD television, as well as my PC, Mac and internet hub, and this allows me to view my photos, listen to my iTunes collection of music and internet radio stations from [Live365], as well as rent movies and TV shows from Amazon Unbox, with prices ranging from free to four dollars.
The theme of this week was "Something is in the Air", an obvious reference to this product, billed as the world's thinnest laptop.John Windsor on his YouBlog writes[Making it Memorable] aboutthe use of a standard office envelope to demonstrate how thin this new MacBook Air laptop is. It is 0.16 inchesat one end, and 0.76 inches as the other end. Unlike other "ultra-thin" laptops, this has a full-size back-lit keyboardand full-size 13.3 inch widescreen. The touchpad supports multi-touch gestures similar to the iPhone and iPod Touch.Intel managed to shrink down their Core 2 Duo processor chip by 60 percent to fit inside this machine. Thebattery is reported to last five hours.
This laptop was designed for wireless access, with 802.1n and BlueTooth enabled. No RJ-45 connection for traditionalLAN ethernet connection, but I guess you can use a USB-to-RJ45 converter.
Storage-wise, you can choose between the 1.8-inch 80GB HDD or a pricey-but-faster 64GB Flash Solid-State Disk (SSD).In a move similar to [getting rid of the 3.5-inch floppy disk in 1998's iMac G3], the MacBook Air got rid of the CD/DVDdrive. While they offer a USB-attachable SuperDrive as an optional peripheral, Steve Jobs gave alternative methods:
Watching movies on DVD
Rent or Buy from iTunes instead
Burning music CDs for your car stereo
Attach your iPod to your car stereo
Taking backups to CD or DVD
Use Time Machine and Time Capsule instead
Installing Software from CD
Wirelessly connect to a "Remote Optical Disc" on a Mac or PC, running special Apple-provided software that allows you to make this connection
Here's a list to the 90-minute[keynote address video]. If you arenot a fan of recycling, saving the environment, free speech or democracy, you can safely skip the last 15 minutes when musical artist Randy Newman performs.For alternative viewpoints on the keynote, see posts from [John Gruber] and [Tara MacKay].
In addition to creating the Dilbert cartoon, Scott Adams has a blog, which sometimes is quite serious,and other times quite funny. The anticipated 30x cost of "Flash Drives" for Enterprise disk systems reminded meof one of Scott's articles from November 2007 titled [Urge to Simplify].Here's an excerpt:
Now the casinos have people trained, like chickens hoping for pellets, to take money from one machine (the ATM), carry it across a room and deposit in another machine (the slot machine). I believe B.F. Skinner would agree with me that there is room for even more efficiency: The ATM and the slot machine need to be the same machine.
The casinos lose a lot of money waiting for the portly gamblers with respiratory issues to waddle from the ATM to the slot machines. A better solution would be for the losers, euphemistically called “players,” to stand at the ATM and watch their funds be transferred to the hotel, while hoping to somehow “win.” The ATM could be redesigned to blink and make exciting sounds, so it seems less like robbery.
I’m sure this is in the five-year plan. Longer term, people will be trained to set up automatic transfers from their banks to the casinos. People will just fly to Vegas, wander around on the tarmac while the casino drains their bank accounts, then board the plane and fly home. The airlines are already in on this concept, and stopped feeding you sandwiches a while ago.
Perhaps EMC can redesign its DMX-4 to "blink and make exciting sounds" as well. The Flash Drives were designedfor the financial services industry, so those disk systems could be directly connected to make transfers between the appropriate bank accounts.