Last Friday,The "Greater IBM Connection" team held a "red carpet" event, showcasing the winners of the Second Life "machinima"
.It is best explained on the Linden Lab website:
Machinima is the art of making real movies in virtual worlds.
Movies made in Second Life use the world's building, scripting, and avatar customization tools, working in real-time collaboration with people around the globe. You can use Second Life as your own virtual back lot, soundstage, choreography studio, costume and prop repository, and special effects house.
The seven videos were shown in Second Life, and are now available on YouTube for those who missed them.
technorati tags: IBM, Greater IBM Connection, Second Life, machinima, red carpet
Tuesday is always good for announcements. Today, Gartner, Inc.
announced that IBM has taken over HP in its climb to the top. I'll quote directly from today's press release:
STAMFORD, Conn., March 6, 2007 — Worldwide external controller-based (ECB) disk storage revenue totaled $15.2 billion in 2006, a 4.1 percent increase over 2005 revenue of $14.6 billion, according to Gartner, Inc.IBM overtook Hewlett-Packard for the No. 2 position in 2006 (see Table 1). IBM’s worldwide ECB market share increased to 15.8 percent, while HP’s market share dropped to 13.1 percent.
IBM beat HP both in 4Q06, as well as 2006 full year.You can read more about it from Gartner Dataquest report “Market Share: Disk Array Storage, All Regions, All Countries, 1Q05-4Q06" on their website. (Note: non-IBMers might need an account with Gartner to access this, not sure)
The focus was on external controller-based disk, not external controller-less SCSI/SAS disk, not disk arrays posing as virtual tape libraries, nor any disk sold inside HP, Sun, IBM or Dell servers. This is to compare with disk-only vendors such as EMC and HDS. The revenues reflect hardware only, including hardware-related parts of financial leases and managed services. Revenues from optional priced software features such as multi-pathing drivers, management software, or advanced copy services were excluded.I discussed these types of analyst reports back in blog post last September: Space Race Heats Up.
These marketshare numbers are based on revenues, not units or terabytes. When a box gets sold, the revenue was counted toward the vendor that sold it, not the manufacturer that built it. In this last report:
- When Dell sells an EMC box, it gets counted as Dell. When Fujitsu Siemens sells an EMC box, it gets counted as "Other".
- When HP sells an HDS box, it gets counted as HP. When Sun sells the HDS box, it gets counted as Sun.
- When IBM sells its System Storage N series (from the OEM agreement with NetApp), it gets counted as IBM. Both IBM and NetApp experienced growth in the NAS/unified storage arena.
It's still cold here in the Washington DC area, but at least good news like this helps warm me up!
technorati tags: IBM, disk, external controller-based, ECB, Gartner, 4Q06, 2006, revenue, marketshare, HP, EMC, Sun, Dell, NetApp, HDS, NAS
On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox
Today, January 16, IBM launches its latest disk system, the DS3000 series.
There are actually three products in the DS3000 series:
The DS3200 is a 2U high, 12 drive system that attaches to servers via 3Gbps Serial Attach (SAS) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are theDS3200 specifications.
The DS3400 is a 2U high, 12 drive system that attaches to servers via 4Gbps Fibre Channel (FC) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are the DS3400 specifications.
The EXP3000 is a 2U high, 12 drive expansion drawer. It was announced back in August 2006, but is part of theoverall DS3000 series. It can be used directly with servers, but is also designed to be attached to the back of the DS3200 or DS3400 to increase capacity.Here are the EXP3000 specifications.
With this announcement, IBM provides entry-level storage at the "less-than-$5000" price point, withsupport for intermix of 10K and 15K RPM drives, and scalable up to 14.4 TB capacity.This would be ideal storage for HP, Dell, IBM System x and BladeCenter servers.
technorati tags: IBM, disk, DS3000, DS3200, DS3400, EXP3000, HP, Dell, SAS, SATA, FC
On his "Data Storage - Dullness becomes Mainstream" blog, Chris Evans is
amazed athow low they can go!
.He compares the latest 100GB Toshiba 1.8" drive designed for portable music players, to the size andweight of older technology, like the IBM 3380 Direct Access Storage Device (DASD).
Chris couldn't find the dimensions of the 3380, so I thought I would provide the missing detail.The IBM 3380 History Archivesprovides a nice summary:
- The CJ2 model that Chris mentions was announced September 1, 1987 and shipped in 1988. Earlier models of the 3380 were announced 1980-1986.
- Capacity and performance were measured in 7-bit "characters", since we were not yet storing full 8-bit bytes.
- By today's standards, having such a large box to hold a few GB might seem amusing, but at the time, this unit was four times the capacity as its predecessor, the IBM 3350 DASD. Compare that with our first disk system, the IBM 350 Disk Storage Unit, introduced in 1956, that stored only 5 million characters (5MB) and was the size of two refrigerators.
- The term "DASD", pronounced daz-dee, was used as some earlier devices were based on magnetic drums or strips of magnetic tape. Today, DASD is still a common term for disk systems among mainframe administrators.
- The 3380 was also twice as fast as the IBM 3350, at 3 million characters per second (3 MB/sec). The irony was thatthe mainframe servers could not keep up, so a Speed Matching Buffer feature was invented to slow it down to half-speed, when used with certain models of mainframe.
As for the dimensions, I too had a hard time finding a publicly available resource that listed 3380 dimensions,so I searched internal IBM resources, and finally, asked someone over in the next building just to measure one ofthe 3380K models we still have in the Tucson test lab floor. The dimensions are ... (drumroll please)
- 70 inches (1778mm) tall
- 44 inches (1117mm) wide
- 32 inches (812mm) deep
The result is that the box could actually hold a much more impressive 52,500 of the new Toshiba drives, twicethe original, albeit conservative, estimate. Before anyone"tries this at home", however, keep in mind that around each Toshiba drive,as with any ATA drive, you need to have all the electronics to communicate to the outside world, and provide cooling. Running tens of thousands of these little guys in the spaceof 60 square feet would probably melt the floor or set off your smoke alarm system.
At least take a backup first.
technorati tags: Chris Evans, Toshiba, IBM, 3380, DASD, CJ2, 3350, ATA
Chris Anderson, of Wired magazine, wrote a great article called The Long Tail
This article became a book by the same name published earlier this year, and I just discovered it on a recent visit to Second Life. A lot of IBMers are now alsoSecond Lifers, and I suspect it is just a matter of time before we are conductingour customer briefings there, and getting our year-end bonuses paid directly in Linden bucks.(Those of you not familiar with Second Life can watch this 3-minute video fromthe folks at Text100)
Anyways, the Long Tail describes the new economy of entertainment thanks to digitalstorage. Here are some of the key insights.
- In the past, entertainment was all about hits: hit songs, hit movies,hit novels, and this was primarily because of the economic realities restricted byphysical space. Chris writes: "An average movie theater will not show a film unless it can attract at least 1,500 people over a two-week run; that's essentially the rent for a screen. An average record store needs to sell at least two copies of a CD per year to make it worth carrying; that's the rent for a half inch of shelf space."
- Things have changed. To drive the point home, Robbie Vann-Adibe (CEO of eCast), poses the trick question"What percentage of the top 10,000 titles in any online media store (Netflix, iTunes, Amazon, or any other) will rent or sell at least once a month?" The answer will surprise you. Write down your guess first, then go read here. His digital jukeboxes are able to play from a list of150,000 songs, not the few hundred you'd find at the Tap Room which is rated as having the best jukebox in Tucson.
- The phenomenon is not just limited to music. "Take books," Chris writes, "The average Barnes & Noble carries 130,000 titles. Yet more than half of Amazon's book sales come from outside its top 130,000 titles. Consider the implication: If the Amazon statistics are any guide, the market for books that are not even sold in the average bookstore is larger than the market for those that are..."
This has incredible implications for the storage industry. For one, content providers are going to dig deep into their archives to digitize and deliver "long tail" offerings. If they don't have a deep archive, many will start to build one. Second, the need to search through that large volume of content will become more critical. Classifying and indexing with the appropriate tags and metadata will be an important task.
technorati tags: Chris Anderson, Wired, magazine, IBM, Secondlife, Linden bucks, Text100, Long Tail, Robbie+Vann-Adibe, eCast, NetFlix, iTunes, Amazon, Tap Room, Barnes Noble, deep, archive, metadata, tags
Last week, Steve Jobs demonstrated the latest evidence that theinmates are running the asylum
over at Apple
I wasn't at the event, but thought it would be good to explain some basic concepts ofInformation Lifecycle Management (ILM),using the files on my iPod as an example. (Disclosure: IBM makes the technology inside many of Apple's computers, and so IBMers get to buy Appleproducts at employee prices. I own a Mac Mini based on IBM's POWER4 processor, and an iPod Photo 60GB model).
I have 20,000 MP3 music files, representing 106GB of data. This fits nicely on my 250GB external disk system attached to my Mac Mini, but won't all fit on my little 60GB iPod. I needed a way to decide what music I keep on bothmy iPod and Mac Mini, and which I keep only on my Mac Mini. When I am traveling, I am able to listen only to the musicin the first group, but when I am at home, I am able to listen to all my music in both groups.(Another disclosure: I use my Tivo connected to my LAN to play all my MP3 music through my home stereo system.I had my entire house wired with Cat5 to make this possible.)
Apple's iTunes software lets me decide which MP3 files are copied to my iPod using "playlists". A playlist is a list of songs. Fixed playlists are created manually, each song copied to its list in a specific order. Smart playlists are createdautomatically, via policy. I give it the criteria, and it finds the songs for me. If I import a new music CD,none of the songs will be added to any fixed playlists, but could be added to my smart playlists if I set the policiescorrectly. Apple iTunes supports both "include" and "exclude" methodologies.
I use primarily smart playlists, based on genre and rating. I have tried to keep the number of genre down to a small manageable list:
Of course, what I have for genre may not match what's in theGracenote database, so I sometimes have to makeupdates to match my convention. I've picked these based on my different "applications" for my music. For example, I listen to Ambient music to help me fall asleep on airplanes, but Rock when I exercise at the gym.
- Rhythm & Blues
- Hip Hop
Next, I use the ratings from one to five stars. The advantage to the rating is that I can change them on-the-fly directly on my iPod. All other "metadata" has to be entered only from the keyboard of my Mac Mini.
|*||Files for Mac Mini only, not copied to my iPod|
|**||Non-mix, copied to my iPod, but typically spoken words, such as language lessons|
|***||Mix, music to include in my music mixes|
|****||Keep on my iPod, but re-evaluate|
So, I have five smart playlists, "One Star", "Two Stars", etc. for each rating, and have decidedto keep only the 2, 3, 4 and 5 star songs on my iPod, by simply putting check marks on those playlists to copythem over. I have about 50 songs with 5 stars, and 8000 with 3 stars, and the rest in the other categories,leaving me a few GB to spare.
I also have playlists for each genre, "Rock mix", "Pop Mix", "Ambient Mix", etc. where I have selected thosethat match the genre, AND have 3, 4 or 5 stars. In this manner, I can listen to a mix. If I find a song mis-classified for that genre, I change it to four stars, which serves as myreminder to re-evaluate when I am back at home on my Mac Mini. If I don't want a song in my mix, I just lowerit to 2 stars. I want it off my iPod altogether, I lower it to one star.
This method is simple enough, and allows me to enjoy my music right away, and more effectively, without having to wait for completely finishing my classification process.
Next week, I'm traveling to Africa (purely vacation, not related to my job, my senator, or myinvolvement in anycharitable organizations). My Canon camera has only a 1GB IBM Microdrive, but I am able to offloadmy pictures to my iPod, connected via USB cable, and review the pictures on the little 2-inch screen. By simply "unchecking" my 2-star and 3-starplaylists, and checking only those mixes I plan to take with me, I was able to clear 17GB of space, plenty ofroom for all my photos of elephants and giraffes, but still plenty of music to listen to. Thanks to my simple methodology, I was able to do this with minimal effort, and willhave no problem putting all my music back when I return.
When evaluating an ILM process, many people are overwhelmed by their fear of the classification process, when in reality it doesn't have to be so complicated.
Is there an "iTunes" for the storage in your datacenter? Yes! It's called IBM TotalStorage Productivity Center. It can help you list and classify all the files in your IT environment,including files in your internal disks inside the servers, your NAS and SAN external disk systems, across both IBM and non-IBM hardware.It's a good thing to consider as part of your overall ILM strategy.
technorati tags: Apple, Steve Jobs, inmates, running, asylum, IBM, information, lifecycle, management, iPod, music, genre, star, rating, iTunes, datacenter, TotalStorage, Productivity Center, NAS, SAN, ILM[Read More]
Welcome to my new blog on IBM Developerworks!
I am Tony Pearson, IBM brand marketing strategy, located in Tucson, Arizona. I have degrees in Computer Engineering and Electrical Engineering from the University of Arizona. Over the past 20 years, I have worked in a variety of storage roles, including development projects, product and portfolio management, testing, field support, and now bring that technical experience to marketing.
There are a lot of things to discuss related to storage, and I am never short of opinions. As such, the standard IBM disclaimer applies: “The postings on this site solely reflect the personal views of the author and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.”
I have invited other IBMers to post their opinions, and when they do, their opinions may not necessarily match mine either.
This is an open two-way conversation between IBM, Business Partners, Independent Software Vendors, prospects and existing clients. I encourage everyone to post comments about our products, services, and marketing efforts.
Some job titles can be vague. Have you ever given your title to a person at a cocktail party, only to have to explain exactly what you do? With a title like "IBM Master Inventor and Senior Managing Consultant", this happens to me all the time. To help explain what we do at the Tucson Executive Briefing Center (EBC), I use the following analogy.
People who want to see or interact with animals have several options. One option is to go visit the animals in their natural habitat. A more convenient option, however, is to visit the animals in a zoo. Zoos bring together a wide variety of animals, making it convenient to visit all of them at one time.
I did not fully appreciate the advantage of zoos until I took a safari in Kenya, Africa a few years ago. The word safari means "long journey" in Swahili. For two weeks, we drove around in a Land Rover on bumpy roads across the country. The best time to see the animals was early in the morning and late in the afternoon. We would drive around for hours looking for a type animal we had not seen already. Most came to see the so-called "Big Five": Buffalo, Elephant, Leopard, Lion and Rhinoceros. After two weeks and hundreds of miles, we had seen the "Big Nine" which extends the Big Five to include the Cheetah, Zebra, Giraffe and Hippo, as well as seeing a variety of other, lesser known animals.
When it comes to zoos, there are two kinds.
- Self-guided -- offering the basic zoo experience where you are handed a map to visit the animals on your own.
- Docent-guided -- offering a richer zoo experience where the docent provides added value, leading visitors around the zoo, answering questions, providing education, and comparing the differences between the animals.
Over the past 15 years, IBM has been consolidating storage development in Tucson, Arizona moving storage-related projects from San Jose, CA, from Rochester, MN, and from Raleigh, NC. Tucson has the largest collection of IBM storage hardware and software development in North America. I am one of the three local "docents", guiding the clients that come to Tucson to visit the developers.
(Note: I have seen other analogies to discuss groups of developers. There is an old adage: engineers are [like mushrooms: kept in the dark, covered with manure, and then canned when they are old enough]. In 2008, I had a popular blog post relating [Software Programmers as Bees]. In referring to developers as animals in the zoo in this post, I am treating them in high esteem as the star attractions of the zoo. This blog is not meant as commentary on their hygiene.)
Here are some of the types of developers that our clients ask to interact with:
- Research Scientists
A was hired into IBM back in 1986 as a Research Scientist. When clients want to hear about IBM's future direction over the next 10-15 years, we bring in someone from IBM Research.
- Hardware Engineers
While disk systems may seem no more complicated as arranging books on a shelf, clients often want to talk to hardware engineers related to IBM's tape libraries, especially the IBM System Storage TS3500 library and the High-Density frame that can store multiple cartridges per slot in a spring-loaded manner.
- Software Engineers
I have a Bachelor's degree in Computer Engineering and Master's degree in Electrical Engineering, so I am able to speak both sides of the hardware/software divide. Software engineers here in Tucson develop the microcode that runs on disk and tape hardware, the various GUI, CLI and SMI-S API interfaces, as well as Tivoli Storage software, especially Tivoli Storage Manager (TSM) and Tivoli Storage Productivity Center.
IBM Tucson has a huge test lab, and our testers are very familiar with all of the subtle nuances of interoperability between servers, HBAs, switches and storage devices. We have system and function testers for the individual products, ISV testers to validate software compatability, performance testers, and environment testers to verify the storage devices can handle extremes in temperature, humidity, vibration and noise.
- System Architects
IBM has architects for each product line to help decide which features and functions are developed for each product release. While many software engineers have expertise narrowly focused on an individual component, the system architects need to have a broad awareness of the entire environment. Earlier in my career, I was the chief architect for DFSMS, the storage management element of the z/OS mainframe operating sytsem, and chief architect for what we now call Tivoli Storage Productivity Center.
- Product and Portfolio Managers
Product and Portfolio managers are helpful to explain to clients why IBM invested more in some products than others. I had served as the Portfolio Manager for IBM tape systems. When clients want to talk about the business side of our products, such as pricing, licensing and leasing issues, we bring the product and portfolio managers in.
- IBM Executives
For some clients, high level executives want to speak to their counterparts at IBM, vice president to vice president, executive to executive. Our local IBM executives often help kick off the briefing in the morning, or provide the executive summary and discuss next steps at the end of the day. Golfing, dinners and drinks, of course, are always a popular scheduing option.
On behalf of the rest of the Tucson EBC, I would like to thank all the developers who have helped us last year with client briefings. There are too many to mention, and most are too humble to let me put their names in this blog. Team, your assistance is very appreciated!
Many IBMers consider Tucson to be the headquarters for storage, and I have heard IBM executives refer to Tucson as the center of the universe for storage products. However, IBM is a global company. Just as zoos do not pretend to be complete collections of animals, IBM storage development is not entirely contained in Tucson. IBM Research for storage is also done in Almaden CA, Yorktown Heights NY, and Haifa, Israel. Hardware development is also done in Japan, Europe and Israel. Tivoli Storage has locations in Beaverton, Oregon, and Austin, Texas, to name a few. IBM is a big company, so if I left your favorite location off the list, let me know in the comments below.
Some clients, sales reps and business partners have complained that Tucson is not the most convenient location to get to. I get that. One rep asked why we don't have briefing centers somewhere more accessible, such as Chicago or Atlanta, both cities offer a major airline hub. As much as I personally enjoy cities like Chicago or Atlanta, people don't visit zoos just to see the docents, they come to see the animals. Having docents located in Chicago or Atlanta, standing sadly in front of empty cages with no animals to interact with, makes no sense at all.
With over 350 days of sunshine per year, Tucson is actually a well-kept secret. Clients who have never been to Tucson discover the wonders of the Sonoran desert. Coyotes chase roadrunners across our parking lot. Several clients who have come to visit us have ended up buying retirement homes here. If you haven't been to Tucson, or it has been a while since your last trip, I encourage you to [schedule a briefing]. The weather right now is ideal!
technorati tags: IBM, Tucson EBC
Did IBM XIV force EMC's hand to announce VMAXe? Let's take a stroll down memory lane.
In 2008, IBM XIV showed the world that it could ship a Tier-1, high-end, enterprise-class system using commodity parts. Technically, prior to its acquisition by IBM, the XIV team had boxes out in production since 2005. EMC incorrectly argued this announcement meant the death of the IBM DS8000. Just because EMC was unable to figure out how to have more than one high-end disk product, doesn't mean IBM or other storage vendors were equally challenged. Both IBM XIV and DS8000 are Tier-1, high-end, enterprise-class storage systems, as are the IBM N series N7900 and the IBM Scale-Out Network Attached Storage (SONAS).
In April 2009, EMC followed IBM's lead with their own V-Max system, based on Symmetrix Engenuity code, but on commodity x86 processors. Nobody at EMC suggested that the V-Max meant the death of their other Symmetrix box, the DMX-4, which means that EMC proved to themselves that a storage vendor could offer multiple high-end disk systems. Hitachi Data Systems (HDS) would later offer the VSP, which also includes some commodity hardware as well.
In July 2009, analysts at International Technology Group published their TCO findings that IBM XIV was 63 percent less expensive than EMC V-Max, in a whitepaper titled [COST/BENEFIT CASE
FOR IBM XIV STORAGE SYSTEM Comparing Costs for IBM XIV and EMC V-Max Systems]. Not surprisingly, EMC cried foul, feeling that EMC V-Max had not yet been successful in the field, it was too soon to compare newly minted EMC gear with a mature product like XIV that had been in production accounts for several years. Big companies like to wait for "Generation 1" of any new product to mature a bit before they purchase.
To compete against IBM XIV's very low TCO, EMC was forced to either deeply discount their Symmetrix, or counter-offer with lower-cost CLARiiON, their midrange disk offering. An ex-EMCer that now works for IBM on the XIV sales team put it in EMC terms -- "the IBM XIV provides a Symmetrix-like product at CLARiiON-like prices."
(Note: Somewhere in 2010, EMC dropped the hyphen, changing the name from V-Max to VMAX. I didn't see this formally announced anywhere, but it seems that the new spelling is the officially correct usage. A common marketing rule is that you should only rename failed products, so perhaps dropping the hyphen was EMC's way of preventing people from searching older reviews of the V-Max product.)
This month, IBM introduced the IBM XIV Gen3 model 114. The analysts at ITG updated their analysis, as there are now more customers that have either or both products, to provide a more thorough comparison. Their latest whitepaper, titled [Cost/Benefit Case for IBM XIV Systems: Comparing Cost
Structures for IBM XIV and EMC VMAX Systems], shows that IBM maintains its substantial cost savings advantage, representing 69 percent less Total Cost of Ownership (TCO) than EMC, on average, over the course of three years.
In response, EMC announced its new VMAXe, following the naming convention EMC established for VNX and VNXe. Customers cannot upgrade VNXe to VNX, nor VMAXe to VMAX, so at least EMC was consistent in that regard. Like the IBM XIV and XIV Gen3, the new EMC VMAXe eliminated "unnecessary distractions" like CKD volumes and FICON attachment needed for the IBM z/OS operating system on IBM System z mainframes. Fellow blogger Barry Burke from EMC explains everything about the VMAXe in his blog post [a big thing in a small package].
So, you have to wonder, did IBM XIV force EMC's hand into offering this new VMAXe storage unit? Surely, EMC sales reps will continue to lead with the more profitable DMX-4 or VMAX, and then only offer the VMAXe when the prospective customer mentions that the IBM XIV Gen3 is 69 percent less expensive. I haven't seen any list or street prices for the VMAXe yet, but I suspect it is less expensive than VMAX, on a dollar-per-GB basis, so that EMC will not have to discount it as much to compete against IBM.
technorati tags: IBM, XIV, Gen3, EMC, DMX-4, VMAX, V-Max, HDS, N7900, SONAS, DS8000, CKD, FICON, TCO
Fellow master inventor and blogger Barry Whyte (IBM) recounts the past 20 years of history in IT storage from his perspective in a series of blog posts. They are certainly worth a read:
In his last post in this series, he mentions that the amazingly successful IBM SAN Volume Controller was part of a set of projects:
"IBM was looking for "new horizon" projects to fund at the time, and three such projects were proposed and created the "Storage Software Group". Those three projects became know externally as TPC, (TotalStorage Productivity Center), SanFS (SAN File System - oh how this was just 5 years too early) and SVC (SAN Volume Controller). The fact that two out of the three of them still exist today is actually pretty good. All of these products came out of research, and its a sad state of affairs when research teams are measured against the percentage of the projects they work on, versus those that turn into revenue generating streams."
But this raises the question: Was SAN File System just five years too early?
IBM classifies products into three "horizons"; Horizon-1 for well-established mature products, Horizon-2 was for recently launched products, and Horizon-3 was for emerging business opportunities (EBO). Since I had some involvement with these other projects, I thought I would help fill out some of this history from my perspective.
Back in 2000, IBM executive [Linda Sanford] was in charge of IBM storage business and presented that IBM Research was working on the concept of "Storage Tank" which would hold Petabytes of data accessible to mainframes and distributed servers.
In 2001, I was the lead architect of DFSMS for the IBM z/OS operating system for mainframes, and was asked to be lead architect for the new "Horizon 3" project to be called IBM TotalStorage Productivity Center (TPC), which has since been renamed to IBM Tivoli Storage Productivity Center.
In 2002, I was asked to lead a team to port the "SANfs client" for SAN File System from Linux-x86 over to Linux on System z. How easy or difficult to port any code depends on how well it was written with the intent to be ported, and porting the "proof-of-concept" level code proved a bit too challenging for my team of relative new-hires. Once code written by research scientists is sufficiently complete to demonstrate proof of concept, it should be entirely discarded and written from scratch by professional software engineers that follow proper development and documentation procedures. We reminded management of this, and they decided not to make the necessary investment to add Linux on System z as a supported operating system for SAN file system.
In 2003, IBM launched Productivity Center, SAN File System and SAN Volume Controller. These would be lumped together with Horizon-1 product IBM Tivoli Storage Manager and the four products were promoted together as the inappropriately-named [TotalStorage Open Software Family]. We actually had long meetings debating whether SAN Volume Controller was hardware or software. While it is true that most of the features and functions of SAN Volume Controller is driven by its software, it was never packaged as a software-only offering.
The SAN File System was the productized version of the "Storage Tank" research project. While the SAN Volume Controller used industry standard Fibre Channel Protocol (FCP) to allow support of a variety of operating system clients, the SAN File System required an installed "client" that was only available initially on AIX and Linux-x86. In keeping with the "open" concept, an "open source reference client" was made available so that the folks at Hewlett-Packard, Sun Microsystems and Microsoft could port this over to their respective HP-UX, Solaris and Windows operating systems. Not surprisingly, none were willing to voluntarily add yet another file system to their testing efforts.
Barry argues that SANfs was five years ahead of its time. SAN File System tried to bring policy-based management for information, which has been part of DFSMS for z/OS since the 1980s, over to distributed operating systems. The problem is that mainframe people who understand and appreciate the benefits of policy-based management already had it, and non-mainframe couldn't understand the benefits of something they have managed to survive without.
(Every time I see VMware presented as a new or clever idea, I have to remind people that this x86-based hypervisor basically implements the mainframe concept of server virtualization introduced by IBM in the 1970s. IBM is the leading reseller of VMware, and supports other server virtualization solutions including Linux KVM, Xen, Hyper-V and PowerVM.)
To address the various concerns about SAN File System, the proof-of-concept code from IBM Research was withdrawn from marketing, and new fresh code implementing these concepts were integrated into IBM's existing General Parallel File System (GPFS). This software would then be packaged with a server hardware cluster, exporting global file spaces with broad operating system reach. Initially offered as IBM Scale-out File Services (SoFS) service offering, this was later re-packaged as an appliance, the IBM Scale-Out Network Attached Storage (SONAS) product, and as IBM Smart Business Storage Cloud (SBSC) cloud storage offering. These now offer clustered NAS storage using the industry standard NFS and CIFS clients that nearly all operating systems already have.
Today, these former Horizon-1 products are now Horizon-2 and Horizon-3. They have evolved. Tivoli Storage Productivity Center, GPFS and SAN Volume Controller are all market leaders in their respective areas.
technorati tags: IBM, Barry Whyte, TPC, SANfs, SVC, EBO, Storage Tank, SoFS, SONAS, SBSC, cloud storage, NAS, storage, NFS, CIFS, HTTP
“In times of universal deceit, telling the truth will be a revolutionary act.”
-- George Orwell
Well, it has been over two years since I first covered IBM's acquisition of the XIV company. Amazingly, I still see a lot of misperceptions out in the blogosphere, especially those regarding double drive failures for the XIV storage system. Despite various attempts to [explain XIV resiliency] and to [dispel the rumors], there are still competitors making stuff up, putting fear, uncertainty and doubt into the minds of prospective XIV clients.
Clients love the IBM XIV storage system! In this economy, companies are not stupid. Before buying any enterprise-class disk system, they ask the tough questions, run evaluation tests, and all the other due diligence often referred to as "kicking the tires". Here is what some IBM clients have said about their XIV systems:
“3-5 minutes vs. 8-10 hours rebuild time...”
-- satisfied XIV client
“...we tested an entire module failure - all data is re-distributed in under 6 hours...only 3-5% performance degradation during rebuild...”
-- excited XIV client
“Not only did XIV meet our expectations, it greatly exceeded them...”
-- delighted XIV client
In this blog post, I hope to set the record straight. It is not my intent to embarrass anyone in particular, so instead will focus on a fact-based approach.
- Fact: IBM has sold THOUSANDS of XIV systems
XIV is "proven" technology with thousands of XIV systems in company data centers. And by systems, I mean full disk systems with 6 to 15 modules in a single rack, twelve drives per module. That equates to hundreds of thousands of disk drives in production TODAY, comparable to the number of disk drives studied by [Google], and [Carnegie Mellon University] that I discussed in my blog post [Fleet Cars and Skin Cells].
- Fact: To date, no customer has lost data as a result of a Double Drive Failure on XIV storage system
This has always been true, both when XIV was a stand-alone company and since the IBM acquisition two years ago. When examining the resilience of an array to any single or multiple component failures, it's important to understand the architecture and the design of the system and not assume all systems are alike. At it's core, XIV is a grid-based storage system. IBM XIV does not use traditional RAID-5 or RAID-10 method, but instead data is distributed across loosely connected data modules which act as independent building blocks. XIV divides each LUN into 1MB "chunks", and stores two copies of each chunk on separate drives in separate modules. We call this "RAID-X".
Spreading all the data across many drives is not unique to XIV. Many disk systems, including EMC CLARiiON-based V-Max, HP EVA, and Hitachi Data Systems (HDS) USP-V, allow customers to get XIV-like performance by spreading LUNs across multiple RAID ranks. This is known in the industry as "wide-striping". Some vendors use the terms "metavolumes" or "extent pools" to refer to their implementations of wide-striping. Clients have coined their own phrases, such as "stripes across stripes", "plaid stripes", or "RAID 500". It is highly unlikely that an XIV will experience a double drive failure that ultimately requires recovery of files or LUNs, and is substantially less vulnerable to data loss than an EVA, USP-V or V-Max configured in RAID-5. Fellow blogger Keith Stevenson (IBM) compared XIV's RAID-X design to other forms of RAID in his post [RAID in the 21st Centure].
- Fact: IBM XIV is designed to minimize the likelihood and impact of a double drive failure
The independent failure of two drives is a rare occurrence. More data has been lost from hash collisions on EMC Centera than from double drive failures on XIV, and hash collisions are also very rare. While the published worst-case time to re-protect from a 1TB drive failure for a fully-configured XIV is 30 minutes, field experience shows XIV regaining full redundancy on average in 12 minutes. That is 40 times less likely than a typical 8-10 hour window for a RAID-5 configuration.
A lot of bad things can happen in those 8-10 hours of traditional RAID rebuild. Performance can be seriously degraded. Other components may be affected, as they share cache, connected to the same backplane or bus, or co-dependent in some other manner. An engineer supporting the customer onsite during a RAID-5 rebuild might pull the wrong drive, thereby causing a double drive failure they were hoping to avoid. Having IBM XIV rebuild in only a few minutes addresses this "human factor".
In his post [XIV drive management], fellow blogger Jim Kelly (IBM) covers a variety of reasons why storage admins feel double drive failures are more than just random chance. XIV avoids load stress normally associated with traditional RAID rebuild by evenly spreading out the workload across all drives. This is known in the industry as "wear-leveling". When the first drive fails, the recovery is spread across the remaining 179 drives, so that each drive only processes about 1 percent of the data. The [Ultrastar A7K1000] 1TB SATA disk drives that IBM uses from HGST have specified 1.2 million hours mean-time-between-failures [MTBF] would average about one drive failing every nine months in a 180-drive XIV system. However, field experience shows that an XIV system will experience, on average, one drive failure per 13 months, comparable to what companies experience with more robust Fibre Channel drives. That's innovative XIV wear-leveling at work!
- Fact: In the highly unlikely event that a DDF were to occur, you will have full read/write access to nearly all of your data on the XIV, all but a few GB.
Even though it has NEVER happened in the field, some clients and prospects are curious what a double drive failure on an XIV would look like. First, a critical alert message would be sent to both the client and IBM, and a "union list" is generated, identifying all the chunks in common. The worst case on a 15-module XIV fully loaded with 79TB data is approximately 9000 chunks, or 9GB of data. The remaining 78.991 TB of unaffected data are fully accessible for read or write. Any I/O requests for the chunks in the "union list" will have no response yet, so there is no way for host applications to access outdated information or cause any corruption.
(One blogger compared losing data on the XIV to drilling a hole through the phone book. Mathematically, the drill bit would be only 1/16th of an inch, or 1.60 millimeters for you folks outside the USA. Enough to knock out perhaps one character from a name or phone number on each page. If you have ever seen an actor in the movies look up a phone number in a telephone booth then yank out a page from the phone book, the XIV equivalent would be cutting out 1/8th of a page from an 1100 page phone book. In both cases, all of the rest of the unaffected information is full accessible, and it is easy to identify which information is missing.)
If the second drive failed several minutes after the first drive, the process for full redundancy is already well under way. This means the union list is considerably shorter or completely empty, and substantially fewer chunks are impacted. Contrast this with RAID-5, where being 99 percent complete on the rebuild when the second drive fails is just as catastrophic as having both drives fail simultaneously.
- Fact: After a DDF event, the files on these few GB can be identified for recovery.
Once IBM receives notification of a critical event, an IBM engineer immediately connects to the XIV using remote service support method. There is no need to send someone physically onsite, the repair actions can be done remotely. The IBM engineer has tools from HGST to recover, in most cases, all of the data.
Any "union" chunk that the HGST tools are unable to recover will be set to "media error" mode. The IBM engineer can provide the client a list of the XIV LUNs and LBAs that are on the "media error" list. From this list, the client can determine which hosts these LUNs are attached to, and run file scan utility to the file systems that these LUNs represent. Files that get a media error during this scan will be listed as needing recovery. A chunk could contain several small files, or the chunk could be just part of a large file. To minimize time, the scans and recoveries can all be prioritized and performed in parallel across host systems zoned to these LUNs.
As with any file or volume recovery, keep in mind that these might be part of a larger consistency group, and that your recovery procedures should make sense for the applications involved. In any case, you are probably going to be up-and-running in less time with XIV than recovery from a RAID-5 double failure would take, and certainly nowhere near "beyond repair" that other vendors might have you believe.
- Fact: This does not mean you can eliminate all Disaster Recovery planning!
To put this in perspective, you are more likely to lose XIV data from an earthquake, hurricane, fire or flood than from a double drive failure. As with any unlikely disaster, it is best to have a disaster recovery plan than to hope it never happens. All disk systems that sit on a single datacenter floor are vulnerable to such disasters.
For mission-critical applications, IBM recommends using disk mirroring capability. IBM XIV storage system offers synchronous and asynchronous mirroring natively, both included at no additional charge.
For more about IBM XIV reliability, read this whitepaper [IBM XIV© Storage System: Reliability Reinvented]. To find out why so many clients LOVE their XIV, contact your local IBM storage sales rep or IBM Business Partner.
technorati tags: IBM, XIV, DDF, RAID-5, RAID-10, RAID-X, RAID-6, RAID-DP, HP, EVA, HDS, USP-V, EMC, CLARiiON, V-Max, Disaster Recovery, HGST, UltraStar, A7K1000
I nearly fell out of my chair laughing.
Nigel Poulton over at Ruptured Monkey suggests a variety of nick names for the various storage bloggers, in his post[Storage Blogwars and the Vendor Fight Club].
Of these, fellow blogger Marc Farley suggested for me "Tony Late for Dinner Pearson", which is fair, I guess, given that I often work late to make sure my blog posts are well written, and sometimes that means I am the last to leave the building.
Full Disclosure: I've known Marc for a while now, we have attended events together and even were co-speakers on a conference call for customers.
Perhaps more disturbing is that, for the most part, the storage blogosphere is entirely dominated by men. Where are the women bloggers for storage?
technorati tags: IBM, Late For Dinner, Marc Farley, Nigel Poulton, RupturedMonkey
People are confused over various orders of magnitude. News of the economic meltdownoften blurs the distinction between millions (10^6
), billions (10^9
), and trillions (10^12
).To show how different these three numbers are, consider the following:
- A million seconds ago - you might have received your last paycheck (12 days)
- A billion seconds ago - you were born or just hired on your current job (31 years)
- A trillion seconds ago - cavemen were walking around in Asia (31,000 years)
|That these numbers confuse the average person is no surprise, but that it confuses marketing people in the storage industry is even more hilarious. I am often correcting people who misunderstandMB (million bytes), GB (billion bytes) and TB (trillion bytes) of information.Take this graph as an example from a recent presentation.|
At first, it looks reasonable, back in 2004, black-and-white 2D X-Ray images were only 1MBin size when digitized, but by 2010 there will be fancy 4D images that now take 1TB, representinga 1000x increase. What?When I pointed out this discrepancy, the person who put this chart together didn't know what to fix.Were 4D images only 1GB in size, or was it really a 1000000x increase.
If a 2D image was 1000 by 1000 pixels, each pixel was a byte of information, then a 3D imagemight either be 1000 by 1000 by 1000 [voxels], or 1000 by 1000 at 1000 frames per second (fps). Thefirst being 3D volumetric space, and the latter called 2D+time in the medical field, the rest of us just say "video".4D images are 3D+time, volumetric scans over time, so conceivably these could be quite large in size.
The key point is that advances in medical equipment result in capturing more data, which canhelp provide better healthcare. This would be the place I normally plug an IBM product, like the Grid Medical Archive Solution [GMAS], a blended disk and tape storage solution designed specifically for this purpose.
So, as government agencies look to spend billions of dollars to provide millions of peoplewith proper healthcare, choosing to spend some of this money on a smarter infrastructure can result in creating thousands of jobs and save everyone a lot of money, but more importantly, save lives.
For more on this, check out Adam Christensen's blog post on[Smarter Planet], which points to a podcast byDr. Russ Robertson, chairman of the Counsel of Medical Education at Northwestern University’s Feinberg School of Medicine, and Dan Pelino, general manager of IBM's Healthcare and Life Sciences Industry.
technorati tags: IBM, smarter healthcare, 2D+time, 3D+time, 4D, medical images, Adam Christensen, Russ Robertson, Dan Pelino
Wrapping up this week's theme on ways to make the planet smarter, and less confusing, I present IBM's third annual [five in five
]. These are five IBM innovations to watch over the next five years, all of which have implications on information storage. Here is a quick [3-minute video
] that provides the highlights:
technorati tags: IBM, five-in-five, innovations, solar, health, talking Web, shopping assistants, forgetting
This week is Thanksgiving holiday in the USA, so I thought a good theme would be things I am thankful for.
I'll start with saying that I am thankful EMC has finally announcedAtmos last week. This was the "Maui" part of the Hulk/Maui rumors we heard over a year ago. To quickly recap, Atmos is EMC's latest storage offeringfor global-scale storage intended for Web 2.0 and Digital Archive workloads. Atmos can be sold as just software, or combined with Infiniflex,EMC's bulk, high-density commodity disk storage systems. Atmos supports traditionalNFS/CIFS file-level access, as well as SOAP/REST object protocols.
I'm thankful for various reasons, here's a quick list:
- It's hard to compete against "vaporware"
Back in the 1990s, IBM was trying to sell its actual disk systems against StorageTek's rumored "Iceberg" project. It took StorageTek some four years to get this project out,but in the meantime, we were comparing actual versus possibility. The main feature iswhat we now call "Thin Provisioning". Ironically, StorageTek's offering was not commercially successful until IBM agreed to resell this as the IBM RAMAC Virtual Array (RVA).
Until last week, nobody knew the full extent of what EMC was going to deliver on the many Hulk/Maui theories. Severalhinted as to what it could have been, and I am glad to see that Atmos falls short of those rumored possibilities. This is not to say that Atmos can't reach its potential, and certainly some of the design is clever, such as offering native SOAP/REST access.
Instead, IBM now can compare Atmos/Infiniflex directly to the features and capabilities of IBM's Scale Out File Services [SoFS], which offers a global-scale multi-site namespace with policy-based data movement, IBM System Storage Multilevel Grid Access Manager[GAM] that manages geographical distrubuted information,and IBM [XIV Storage System] that offers high-density bulk storage.
- Web 2.0 and Digital Archive workloads justify new storage architectures
When I presented SoFS and XIV earlier this year, I mentioned they were designed forthe fast-growing Web 2.0 and Digital Archive workloads that were unique enough to justify their own storage architectures. One criticism was that SoFS appeared to duplicate what could be achieved with dozens of IBM N series NAS boxes connected with Virtual File Manager (VFM). Why invent a new offering with a new architecture?
With the Atmos announcement, EMC now agrees with IBM that the Web 2.0 and DigitalArchive workloads represent a unique enough "use case" to justify a new approach.
- New offerings for new workloads will not impact existing offerings for existing workloads
I find it amusing that EMC is quickly defending that Atmos will not eat into its DMXbusiness, which is exactly the FUD they threw out about IBM XIV versus DS8000 earlier this year. In reality, neither the DS8000 nor the DMX were used much for Web 2.0 andDigital Archive workloads in the past. Companies like Google, Amazon and others hadto either build their own from piece parts, or use low-cost midrange disk systems.
Rather, the DS8000 and DMX can now focus on the workloads they were designed for,such as database applications on mainframe servers.
- Cloud-Oriented Storage (COS)
Just when you thought we had enough terminology already, EMC introduces yet another three-letter acronym [TLA]. Kudos to EMC for coining phrases to help move newconcepts forward.
Now, when an RFP asks for Cloud-oriented storage, I am thankful this phrase will help serve as a trigger for IBM to lead with SoFS and XIV storage offerings.
- Digital archives are different than Compliance Archives
EMC was also quick to point out that object-storage Atmos was different from theirobject-storage EMC Centera. The former being for "digital archives" and the latter for"compliance archives". Different workloads, Different use cases, different offerings.
Ever since IBM introduced its [IBM System Storage DR550] several years ago, EMC Centera has been playing catch-up to match IBM'smany features and capabilities. I am thankful the Centera team was probably too busy to incorporate Atmos capabilities, so it was easier to make Atmos a separate offering altogether. This allows the IBM DR550 to continue to compete against Centera's existingfeature set.
- Micro-RAID arrays, logical file and object-level replication
I am thankful that one of the Atmos policy-based feature is replicating individualobjects, rather than LUN-based replication and protection. SoFS supports this forlogical files regardless of their LUN placement, GAM supports replication of files and medical images across geographical sites in the grid, and the XIV supports this for 1MBchunks regardless of their hard disk drive placement. The 1MB chunk size was basedon the average object size from established Web 2.0 and DigitalArchive workloads.
I tried to explain the RAID-X capability of the XIV back in January, under muchcriticism that replication should only be done at the LUN level. I amthankful that Marc Farley on StorageRap coined the phrase[Micro-RAID array] to helpmove this new concept further. Now, file-level, object-level and chunk-level replication can be considered mainstream.
- Much larger minimum capacity increments
The original XIV in January was 51TB capacity per rack, and this went up to 79TB per rack for the most recent IBM XIV Release 2 model. Several complained that nobody would purchase disk systems at such increments. Certainly, small and medium size businessesmay not consider XIV for that reason.
I am thankful Atmos offers 120TB, 240TB and 360TB sizes. The companies that purchasedisk for Web 2.0 and Digital Archive workloads do purchase disk capacity in these large sizes. Service providers add capacity to the "Cloud" to support many of theirend-clients, and so purchasing disk capacity to rent back out represents revenue generating opportunity.
- Renewed attention on SOAP and REST protocols
IBM and Microsoft have been pushing SOA and Web Services for quite some time now.REST, which stands for [Representational State Transfer] allows static and dynamic HTML message passing over standard HTTP.SOAP, which was originally [Simple Object Access Protocol], and then later renamed to "Service Oriented Architecture Protocol", takes this one step further, allowingdifferent applications to send "envelopes" containing messages and data betweenapplications using HTTP, RPC, SMTP and a variety of other underlying protocols.Typically, these messages are simple text surrounded by XML tags, easily stored asfiles, or rows in databases, and served up by SOAP nodes as needed.
- It's hard to show leadership until there are followers
IBM's leadership sometimes goes unnoticed until followerscreate "me, too!" offerings or establish similar business strategies. IBM's leadership in Cloud and Grid computing is no exception.Atmos is the latest me-too product offering in this space, trying pretty muchto address the same challenges that SoFS and XIV were designed for.
So, perhaps EMC is thankful that IBM has already paved the way, breaking throughthe ice on their behalf. I am thankful that perhaps I won't have to deal with as much FUD about SoFS, GAM and XIV anymore.
technorati tags: IBM, SoFS, XIV, GAM, DS8000, EMC, Atmos, Hulk, Maui, Infiniflex, STK, StorageTek, Iceberg, RVA, thin provisioning, VFM, SOAP, REST, DMX, RAID-X, Micro-RAID
In his post on Rough Type
titled ["McKinsey surveys the new software landscape"
], Nick Carr discusses the growing acceptance in the marketplace for Software-as-a-Service, or SaaS.He summarizes the results of McKinsey's recent[Enterprise Software Customer Survey 2008].IBM is already well established as part of the Web 2.0 Big "5" (the other four are Google, Yahoo, Amazon and Microsoft), so it may not be much surprise that it introduced some new offerings focused on this emerging market.
Whether you are looking to contract out for SaaS, or to provide a service to others over the cloud, IBM can help!
- Managed Hosting
For managed hosting, [IBM Managed Storage Services] hasbeen extended to support archive data through its entire lifecycle: supporting access, migration, non-erasablenon-rewriteable (NENR) protection, and expiration/destruction. This offering supports locating the storage onthe customer premises, a hosting center, or an IBM Service Deliver Center. IBM's blended disk and tape approachprovides a better alignment between information value and storage costs.
- Application-Led Service
Last December IBM acquired Arsenal Digital, which offers a remote "Enterprise Email Archive" service, supporting retention policies that can apply per user,per group, or even my message, as needed. This service provides fast user access to email archives, as well as e-discovery search. The search is not just for the email body text, but supports over 370different attachment types as well. Deduplication technology is used to reduce the actual amount of storage needed by 80percent. All of this with the security and comfort of knowing that these email archives are encrypted and protected in a disaster recovery class datacenter managed by IBM.Blocks and Files presents their thoughts on this in the article["IBM storing data and mail in the cloud"].
The Radicati Group has published some interesting statistics about email archive in[Volume 4, Issue 3]. Here's an excerpt:
- "In 2007, a typical corporate email account receives about 18 MB of data per day. This number is expected to grow to over 28 MB by 2011. Today, there is no way to effectively manage these messages, but with the help of an archiving solution.
- Today, the worldwide percentage of corporate mailboxes protected by archiving solutions is estimated to be around 14%, however it is growing at a fast pace, and is expected to reach over 70% by 2011.
- A survey of 102 corporate organizations worldwide, showed that 68% of large businesses view compliance as their top security concern in 2007."
- Cloud Computing
For those who are actually providing these services to others, over the cloud, then you might want to use the new[IBM System x iDataPlex].Compared to traditional server environments, the iDataPlex provides five times the computing power by doubling the number of servers per rack, but with 40 percent less energy consumption. Thanks to clevercooling technology, the system can run in standard office "room temperature" environments. You cancustomize with a mix of compute, network and storage nodes to meet your application requirements.In addition to Web 2.0 and SaaS workloads, the iDataPlex can be useful for financial risk analysis,high performance computing, and even batch processing.
technorati tags: Rough Type, Nick Carr, McKinsey, SaaS, Google, Yahoo, Amazon, Microsoft, managed hosting, storage services, NENR, archive, IBM, Service Delivery Center, Arsenal Digital, deduplication, Radicati Group, iDataPlex, Web2.0[Read More]
Well, today is April 1, and I just love [April Fools' Day
].This day has a rich history of practical jokes. Those not familiar can review this list of [Top 100 pranks and hoaxes
Tim Ferris started the festivities with [The Grand Illusion: The Real Tim Ferriss speaks]. He claimed that for the past year, he outsourced the writing of his blog to a writer from India, and an editor from the Philippines. Given that his post was dated March 31, and he writes frequently about the benefits of outsourcing, it appeared like a legitimate post. However, Tim fessed up the following day, claiming that it was April 1 in Japan where he wrote it.
Guy Kawasaki wrote[April Fools' Stories You Shouldn't Believe]including my favorite #12 "Ruby on Rails cited Twitter as the centerpiece of its new 'Rails Can Scale' marketing program." Speaking of Twitter, Fellow IBM blogger Alan Lepofsky from our Lotus Notes team wrote[Great, now there is Twitter Spam]. It looked like a real post, but then I realized, ... everything on Twitter is spam!
Topics like energy consumption and global warming were fodder for posts and pranks.The post[Was Earth Hour a joke again?], argued thatthe preparation of "Earth Hour" last week in effect used up more energy than the hour of this annual "lights-off event" actually saved. This reminded me of John Tierney's piece in the New York Times ["How virtuous is Ed Begley, Jr.?"] where a scientist explains that it is more "green" for the environment to drive a car short distances than to walk:
If you walk 1.5 miles, Mr. Goodall calculates, and replace those calories by drinking about a cup of milk, the greenhouse emissions connected with that milk (like methane from the dairy farm and carbon dioxide from the delivery truck) are just about equal to the emissions from a typical car making the same trip. And if there were two of you making the trip, then the car would definitely be the more planet-friendly way to go.
Wayan Vota, my buddy over at OLPCnews, writes in his post[Windows XO Child Centric Development] that the "Sugar" operating environment on the innovative Linux-based XO laptops will soon be re-named the"Windows XO Operating System", with their new motto "Windows XO: A Child-Centric Operating Platform for Learning, Expression and Exploration." The mocked up photo of an XO laptop with the Windows XO logo was excellent!
Gretchen Rubin reminds us that this is a great day to play tricks on your kids in[How April Fool’s day can be a source of happiness], and last week, Kai Ryssdal on NPR Radio investigated if [Mind Habits] was [a video game that's good for you?]This claims that just playing five minutes per day can reduce stress. I haven't been able to stop playing after five minutes, Mind Habits is like the proverbial potato chip, you can't just eat one!
The economists from Freakonomics explain in [And While You're at it, Toss the Nickel] that it costs the US Government 1.7 cents to produce each penny. The US government loses $50 million dollars each year making pennies. Each nickel costs 10 cents to produce. This one was dated March 31, so it could actually be true. Sad, but true.
My favorite, however, was EMC blogger Barry Burke's post["5773 > c"] explaining howtheir scientists were able to reduce latency on the EMC SRDF disk replication capability:
What the de-dupe team found is that there is a hidden feature within recent generations of this chip that allow a single bit, under certain circumstances, to represent TWO bits of information.
Still, almost 34% of the total bits transferred were in fact aligned double-zeros, far more than all other bit combinations - and most importantly, these were quite frequently byte-aligned, as required by this new-found capability. Makes sense, if you think about it - most of those 32- and 64-bit integers are used to store numbers that are relatively small (years, months, days, credit charges, account balances, etc.). So that's why the team decided to use this new two-fer bit to represent "00".
Mathematically, if you can transmit 34% of the data using half as many bits, you reduce the number of bits you have to transfer in total by 17%. Which, while not necessarily earth-shattering, is nothing to be ashamed of. On top of the SRDF performance enhancements delivered in 5772 (30% reduction in latency or 2x the distance), this new enhancement adds another 17% latency improvement (or ~1.4x more distance at the same latency). Combined with 5772, SRDF/S customers could see a 50% reduction in latency. And 5773 allows SRDF/A cycle times to be set below 5 seconds (with RPQ) - this new feature adds a little headroom to maximize bandwidth efficiency for the shortest possible RPO.
Again, this looked real, until I did the math. Start with the speed of light in a vacuum of space ("c" in BarryB's title) which is roughly 300,000 kilometers per second, or put into more understandable units, 300 kilometers per millisecond. However, light travels slower through all other materials, and for fiber optic glass it is only 200 kilometers per millisecond. Sending a block of data across 100km, and then getting a response back that it arrived safely, is a total round-trip distance of 200km, so roughly 1 millisecond. However, EMC SRDF often takes two or three round-trips per write, versus IBM Metro Mirror on the IBM System Storage DS8000 which has got this down to a single round-trip. The number of round-trips has a much bigger effect on latency than EMC's double-bit data compression technique. With IBM, you only experience about 1 millisecond latency per write for every 100km distance between locations, the shortest latency in the industry.
It is good that once a year, you should be skeptical of what you read in the blogosphere, and sometimes check the facts!
technorati tags: April Fools Day, Tim Ferris, 4HWW, outsourcing, Guy Kawasaki, Ruby on Rails, Twitter, Alan Lepofsky, Lotus, Notes, Earth Hour, spam, John Tierney, Ed Begley Jr., milk, carbon dioxide, Wayan Vota, OLPCnews, Windows XO, Gretchen Rubin, Kai Ryssdal, Freakonomics, NPR, Mind Habits, penny, nickel, EMC, BarryB, SRDF, IBM, DS8000, Metro Mirror, latency, fiber optic, speed of light
I got some interesting queries about IBM's Scale-Out File Services [SoFS
] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting
]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
- Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
- Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
- Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
technorati tags: IBM, SoFS, Acopia, VFM, Brocade, ILM, global namespace, clustered, file system, disk, tape, storage, system, CIFS, NFS, NAS, NTFS, ACL, DFS, AFS, Transarc, ASC Purple, DS3200, SAS, FC, FCP, DS8300, Turbo, DCS9550, SVC, FATA, SATA, nodes, backup, restore, recovery, Blue Cloud, cloud computing, PolyServe, HDS, BlueArc, HP, Sun, CFS, Lustre, Isilon, Exanet, EMC, Rainfinity, Hulk, Maui
Last week, I covered backup issues in [Deduplicationversus Best Practice for Backups
]. This week, I thought I would cover issues with email.
At IBM, our standard is to have a limit of 200MB per user mailbox. A few of us get exceptions and have up to500MB limit because of the work we do. By comparison, my personal Gmail account is now up to 6500MB. Whenthis limit is exceeded, you are unable to send out any mail until it is brought down below the limit, and a request to be "re-enabled for send" is approved, a situation we call "mail jail".
The biggest culprit are attachments. Only 10 percent of emails have attachments, but those that do take up 90percent of the total space! People attach a 15MB presentation or document, and copy the world ondistribution list. Everyone saves their notes with these attachments, and soon, the limits are blown. Not surprisingly, deduplication has been cited as a "killer app" to address email storage, exactly for this reason.If all the users have their mailboxes all stored on the same deduplication storage device, it might find theseduplicate blocks, and manage to reduce the space consumed.
A better practice would be to avoid this in the first place. Here are the techniques I use instead:
- Point to the document in a database
We are heavy users of Lotus Notes databases. These can be encrypted and controlled with Access Control Lists (ACL)that determine who can create or read documents in each database. Annually, all the database ACLs are validatedso that people can confirm that they continue to have a need-to-know for the documents in each database. Sendinga confidential document as a "document link" to a database entry takes only a few bytes, and all the recipientsthat are already on the ACL have access to that document.
- Point to the document on a web page
If the document is available on an internal or external website, just send the URL instead of attaching the file.Again, this takes only a few bytes. We have websites accessible only to all internal employees, websites thatcan be accessed only by a subset of employees with special permissions and credentials based on their job role, and websites that are accessible to our IBM Business Partners.
In my case, if I happen to have a blog posting that answers a question or helps illustrate an idea, I will sendthe "permalink" URL of that blog post in my email.
- Point to the document on shared NAS file system
Internally, IBM uses a "Global Storage Architecture" (GSA) based on IBM's Scale-Out File Services [SoFS] with everyone getting initially 10GB of disk space to store files, with the option to request more if needed. The system has policy-based support for placing and migrating older data to tape to reduce actual disk usage, and combines a clustered file system with a global name space.
My SoFS space is now up to 25GB, and I store a lot of presentationsand whitepapers that are useful to others. A URL with "ftp://" or "http://" is all you need to point to a filein this manner, and greatly reduces the need for attachments. I can map my space as "Drive X:" on my Windows system,or as a NFS mount point on my Linux system, which allows me to easily drag files back and forth.
Departments that don't need to offer "worldwide access" use NAS boxes instead, such as the IBM System Storage N series.
Pointing to files in a shared space, rather than as attachments in email, may take some getting used to. I've hada few recipients send me requests such as "can you send that as an attachment (not a URL)" because they plan toread it on the airplane or train, where they won't have online connectivity.
This all relates to new ways for employees to collaborate. Shawn from Anecdote writes in the post[Fostering a Collaboration Culture]:
"Have you invested in the latest and greatest in collaboration technology but still feel people are still not collaborating? How many Microsoft Sharepoint servers and IBM Quickplaces remain relatively untouched or only used by the organization's technorati? I think it's a big problem because this narrow view of collaboration starts to get the concept a bad name: "yeah, we did collaboration but no one used it." And then there the issue of the vast amount of money wasted and opportunities lost. We can't afford to loose faith in collaboration because the external environment is moving in a direction that mandates we collaborate. The problems we face now and into the future will only increase in complexity and it will require teams of people within and across organizations to solve them."
Well, sending pointers instead of attachments works for me, and has kept me out of "mail jail" for quite some timenow.
technorati tags: IBM, deduplication, email, mailbox, Gmail, attachment, Lotus, Notes, database, URL, Permalink, GSA, NAS, SoFS, disk, Anecdote
In addition to creating the Dilbert cartoon, Scott Adams has a blog, which sometimes is quite serious,and other times quite funny. The anticipated 30x cost of "Flash Drives" for Enterprise disk systems reminded meof one of Scott's articles from November 2007 titled [Urge to Simplify
].Here's an excerpt:
Now the casinos have people trained, like chickens hoping for pellets, to take money from one machine (the ATM), carry it across a room and deposit in another machine (the slot machine). I believe B.F. Skinner would agree with me that there is room for even more efficiency: The ATM and the slot machine need to be the same machine.
The casinos lose a lot of money waiting for the portly gamblers with respiratory issues to waddle from the ATM to the slot machines. A better solution would be for the losers, euphemistically called “players,” to stand at the ATM and watch their funds be transferred to the hotel, while hoping to somehow “win.” The ATM could be redesigned to blink and make exciting sounds, so it seems less like robbery.
I’m sure this is in the five-year plan. Longer term, people will be trained to set up automatic transfers from their banks to the casinos. People will just fly to Vegas, wander around on the tarmac while the casino drains their bank accounts, then board the plane and fly home. The airlines are already in on this concept, and stopped feeding you sandwiches a while ago.
Perhaps EMC can redesign its DMX-4 to "blink and make exciting sounds" as well. The Flash Drives were designedfor the financial services industry, so those disk systems could be directly connected to make transfers between the appropriate bank accounts.
technorati tags: Scott Adams, Dilbert, B.F. Skinner, ATM, casinos, EMC, DMX-4
I'm here at the Los Angeles airport on my way to Canada.
On my post last week[My Blook is Now Available],Cheryl Hagedorn comments:
I've just posted about your blook at Blooking Central http://blooking.blogspot.com/2007/11/inside-system-storage.html
I'll love to hear from you (I post letters from authors!) about how you put the blook together. Many folks have used cut and paste from blog page into word processor. Others have simply backed up their blogs, then cut and pasted. Some folks had the foresight to compose their posts in a word processor before posting!
Anyway, I'd like to know whatever ins and outs you'd like to share. Thanks.
Well Cheryl, I couldn't find any email address to send you a response, so Idecided to post here instead and post a traceback on your blog.
After learning about the Blooker Prize, I had asked our IBM Developerworks team if anyone else within IBM had published a blook, but nobody had heard of anything, so I had to look elsewhere.I got a lot of guidance from Lulu's [Book Publishing FAQs], and Don Campbell's[Five Steps to Publishing Your Paperback Book at Lulu],and how-to articles over at [bookcatcher.com].
- Decision 1: Defining the Container
Before you can cut-and-paste anything, you need a container file to put it in. Here were my key decisions:
- Page Size: Novel 6"x9" (15cm x 23cm) to support both perfect-bound paperback and dust-jacket hardcopy editions
- Colors: Full-color covers with black-and-white interior
- Fonts: 10pt Book Antiqua for the text, Courier for the monospaced computer examples,8pt for the "copyright" fine print
- Format: *.doc Microsoft Word file, using [Lulu's ready-to-use templates]
- Software: Office 2003 version of Microsoft Word on Windows XP system
- Front matter: Title, Copyright, Dedication, Table of Contents, Foreword, Introduction
- Back matter: Blog Roll, Blogging Guidelines, Glossary, Reference table, What people have written about me and my blog
According to Lulu, you could use OpenOffice instead with RTF files. I didn't try that. I did tryusing CutePDF to upload ready-made PDFs, that didn't work. I also tried saving text in PDF formaton my Mac Mini running OS X 10.4 Tiger, but Lulu didn't like that either.IBM now offers a free download of [LotusSymphony] that might be an alternative for my next book.
For my blook, the "Blog Roll" serves instead of a more formal [Bibliography]. I could have also includedonline magazines and other web resources.
- Decision 2: Chapter Configuration
I reviewed other blooks to see how they were organized. I thought I might organize the blog posts by topic or category, but all the blooks I looked atwere strictly chronological, oldest post first. This of course is exactly opposite as theyappear on the web browser. I decided to keep things simple, with just 12 chapters, one for each calendar month.
Each chapter was separated by a section break with unique footers, starting on odd page number. The footers have the page numbers on the outside edges, so that even pages had numbers on the left, and odd pages on the right. I also added the name of the chapter and the book, like so:
40 ................December 2006| |Inside System Storage.... 41
This was a lot of work, but makes the book look more "professional".
- Decision 3: Cut-and-Paste
People have asked me why it took three months to put my blook together, and I explainedthat the cut-and-paste process was manually intensive. My posts are either HTML entereddirectly into Roller webLogger, or typed in HTML on Windows Notepad and cut-and-pastedover to Roller later. I have access to the HTML source of each post, as wellas how it appears on the webpage, and tried cut-and-paste both ways. Copying theHTML source meant having to edit out all the HTML tags. I hadn't even looked into the idea of "backing up" through Roller all the entries, but they would probably have been HTMLsource as well.
In turned out that copying the webpage directly from the browser was better, which retains more of the formatting,and automatically eliminates all of the pesky HTML tags. I wanted the printed versions to resemblethe web page version.
Microsoft Word indicates all hyperlinks as bright blue underlined text which I didn't like, so I removedall hyperlinks, to avoid having to pay extra for "colored pages". This can be done manually, one by one, or pasting with the "text only" option butthis removes out all the other formatting as well. (Specifying black-and-white interior on Lulu might have converted all of these automaticallyto greyscale, so I might have been safe to leave them in,which I probably could have done if I wanted an online e-book version with links active, ... oh well)
To indicate where the hyperlinks would have been, I wrapped all the linked text in[square brackets]. I have now gotten in the habit of doing this for future blog posts, soif I ever make another book, it will cut down the work and effort on the cut-and-paste.
Some of the items I linked to posed a problem. I had to convert YouTube videos to flat imagesof the first frame to include them into the book. Older links were broken, and I had tofind the original graphics. I also sent a note to Scott Adams related about the use of one of his Dilbert cartoons.
I decided to also cut-and-paste my technorati tags and comments. For comments I mademyself, I labeled them "Addition" or "Response". A few people did not realize thatI was "az990tony" making the comments as the blog author, so I changed all to say "az990tony (Tony Pearson)" to make this more clear, and now do this on all future blogposts to minimize the work for my next book.
Because I used a lot of technical terms and acronyms, Microsoft Word actually gave mean error message that there were so many gramattical and spelling errors that it wasunable to track them all, and would no longer put wavy green or red lines underneath.
I did all the cut-and-paste work myself, but since the website is publicly accessible,I could have gotten someone else to do this for me.Had I read Timothy Ferriss' book The Four Hour Work Week sooner,I might have taken his advice on [Outsourcing the project to someone in India]. I might consider doing this for my next book.
- Decision 4: Numbering the Posts
I decided I wanted to standardize the title of each post. The date was not uniqueenough, as there were days that I made multiple posts. So, I decided to assign eacha unique number, from 001 to 165, like so:
2006 Dec 12 - The Dilemma over future storage formats (033)
Posts that referred back to one of my earlier posts within the book had (#nnn) added so that readers couldgo jump back to them if they were interested. This eliminated trying to keep track of pagenumbers.
- Decision 5: Adding behind-the-scenes commentary
- One of the reasons I rent or buy DVDs is for the director's audio commentary and deleted scenes. These extras provided that added-value over what I saw in the movietheatre. Likewise, 80 percent of a blook is already out in the public for reading, so I felt I needed to provide some added value. At the beginning of each month, I describewhat is going on behind the scenes, and then in front of specific posts, I providedadditional context. This could be context of what was going on in the blogosphere at thetime, announcements or acquisitions that happened, what country I was blogging from, orwhat unannounced products or projects that were being developed that I can now talk aboutsince they are now announced and available.
To distinguish these side comments from the rest of the blog posts,I decorated them with graphics. Searching for copyright-free/royalty-free clip-art, graphics, and photos that represented eachconcept was time-consuming. I shrunk each down to about 1 inch square in size, and changed themfrom color to greyscale. (LuLu conversion to PDF probably would have automaticallyconverted the color graphics to greyscale for me, in which case leaving them in full colormight have been nice for an e-book edition, ... oh well)
I did complete each chapter one at a time. So, for each month, I cut-and-pasted all the blog posts,tags and comments, then fixed up and numbered all the post titles, then added all the behindthe scenes commentary, and cleaned up all the font styles and sizes. I recommend you do this at least for the first chapter, so you can get a good feel for what the finished version will look like.
- Decision 6: Adding a Glossary
I sent early copies of the books to five of my coworkers knowledgeable about storage, andfive local friends who know nothing about storage.
Some of my early reviewers suggested having an index, so that people can find a specific poston a particular topic. Others suggested I spell out all the acronyms that appear everywhereand put that into the Reference section, rather than on each and every occurrence inthe book itself. Both were good ideas, and my IBM colleague Mike Stanek suggested calling ita GOAT (Glossary of Acronyms and Terms). Acronyms are spelled out, and terms or phrasesthat need additional explanation have a glossary definition. For eachitem, I put the post or posts that uses that term. Some terms are covered in dozens ofposts, so I tried to pick five or fewer posts representing the most pertinent.
The glossary was far more time-consuming than I first imagined, with over 50 pages containingover 900 entries. I struggled deciding which terms and acronyms needed explanation, and which were obvious enough. On the good side, itforced me to read and re-read the entire book cover to cover, and I caught a lot of othermistakes, misspellings, and formatting errors that way. Also, I have a large internationalreadership on my blog, so the glossary will help those whose English is not their native language,and will help those readers who are not necessarily experts in the storage industry.
- Decision 7: Designing the Covers
Up to this point, I had been printing early drafts with simple solid color covers. Lulu hasthree choices for covers:
- Just type in the text, upload an "author's photo" and chose a background color or pattern
- Upload PNG files, one for the front cover, one for the back cover, and chose the textand color of the spine.
- Upload a single one-piece PDF file that wraps around the entire book.
I had no software to generate the PDF for the third option, so I decided to try the secondoption. My first attempt was to format the front title page in WORD, capture the screen,convert to PNG and upload it as the front cover. I did same for the back cover, with a smallpicture of me and some paragraphs about the book.
I chose a simple straightforward title on purpose. Thousands of IBM and other IT marketing and technicalpeople will be ordering this book, and submitting their expenses for reimbursement as work-related, and didn't want to cause problems with a cute title like "An Engineer in Marketing La-La Land".
The next step was to use [the GIMP] GNU image manipulationprogram, similar to PhotoShop, to add a cream colored background, a slanted green spine, and some graphics that we had developed professionally for some of our IBM presentations.I learned how to use the GIMP when making tee-shirts and coffee mugs for our [Second Life] events, so I was already familiar. For newblook authors, I suggest they learn how to use this for their covers, or find someone who can do thisfor them.
I did the paperback version first, and once done, it was easy to use the same PNG files forthe dust jacket of the hardcover edition, adding some extra words for the front and back flaps.
The adage "Don't judge a book by its cover" seems to apply to everything except booksthemselves. The book cover is the first impression online, and in a bookstore. I have seenpeople pick books up off the shelf at my local Barnes & Noble, read the front and back covers, peruse the front and backflaps, and make a purchase decision without ever flipping a single page of the contents inside.From an article on Book Catcher [SELF-PUBLISHING BOOK PRODUCTION & MARKETING MISTAKES TO AVOID]:
According to selfpublishingresources website, three-fourths of 300 booksellers surveyed (half from independent bookstores and half from chains) identified the look and design of the book cover as the most important component of the entire book. All agreed that the jacket is the prime real estate for promoting a book.
While many struggle to find the right title and cover art, I think it is interesting that Lululets you post the same book with slightly different titles and covers, each as separate projects, and let market forces decide which one people like best. This is a common practice among marketresearch firms.
- Decision 8: Finding someone to write the Foreword
With the book nearly done, I thought it would be a nice touch to have an IBM executive write a Foreword at the frontof the book. Several turned me down, so I am glad I found a prominent Worldwide IBM executiveto do it. I should have started this process sooner, as she wanted to read my book in its entirety beforeputting pen to paper. I had not planned for this. I was hoping to be done by end of October,but waiting for her to finish writing the Foreword added some extra weeks. Next time,I will start this process sooner.
- Decision 9: Printing Early Drafts
You need to have Lulu print at least one copy to review before making it available to the public,and it doesn't hurt to order a few intermediary draft copies to make sure everything looks right.However, from the time I order it on Lulu, to the time it is in my hands, is over two weeks withstandard shipping, so I needed a way to print drafts to look at in between.
To avoid wear-and-tear on my color ink-jet printer, I went and bought a large black-and-white[Brother HL-5250DN] laser printer. Rather than buying specialty 6x9 paper, I used standard 8.5x11 paperusing the following 2-up duplex method:
- Upload the DOC file to Lulu, and get it converted to PDF
- Download the resulting PDF from Lulu back to your computer
- View the PDF in Adobe Reader, and print it using 2-up "Booklet" mode.
For example, if you print 60 pages in booklet mode, it prints two mini-pages on thefront side, and two more mini-pages on the back side of each sheet of paper, resulting in 15 standard 8.5" x 11" pages that can be folded, stapled, and read like a mini-booklet. My entire blook could be printed on seven of these mini-booklets, saving paper, and giving me a close approximation to what the final book would look like. Eachmini-page is 5.5"x8.5", so just slightly smaller than the final 6"x9" form factor.I fount that 60 pages/15 sheets was about the maximum before it becomes hard to fold in half.
So, if I had to do it all over again, I might have chosen 11pt Garamond (the default), or changedthe default to 11pt Book Antiqua up front, so as not to have spend so much time converting thefonts. I might have left out the glossary. I might have left in all the hyperlinks and graphicsin full color for a separate e-book edition. And I definitely would have looked for an author formy Foreword much earlier in the process.
I didn't plan to write a blook when I started blogging. I have started putting [square brackets]around all my links. I have started putting "az990tony (Tony Pearson)" on all my comments. I hadassumed that people were jumping to all the links I provided in context, but I learned that the blogpost has to stand on its own, so now I make sure that I either paraphrase the important parts, oractually quote the text that I feel is important, so that the blog post makes sense on its own.This is perhaps good advice in general, but even more important if you plan to write a blook later.
Lastly, I decided up front to write blog posts that were 500-700 words long, about the average lengthof magazine or newspaper articles. In my blook, the average is 639 words per post, so I hit thatgoal. I have seen some blogs where each post is just a few sentences. Maybe they are posting fromtheir cell phone, or don't have time to think out a full thought, but who wants to read a year'sworth of [twitter] entries.
Well Cheryl, I hope that helps. If you need anymore, click on the "email" box on the right panel.
technorati tags: Cheryl Hagedorn, Blooking Central, Lulu, Don Campbell, IBM, Developerworks, Book Antiqua, Courier, Garamond, Microsoft, Word, OpenOffice, Lotus, Symphony, PDF, CutePDF, OS X, HTML, Hyperlinks, blook, reference, glossary, Twitter, Timothy Ferriss, fourhourworkweek, outsourcing, India
Continuing my week's theme on Innovations that matter, I thought I would tackle energy efficiency and the recent excitement over the Smart car.
USA Today had an article [America crazy about breadbox on wheels called Smart car]. This car weighs only 2400 pounds, gets a respectable 33 MPG City,and 40 MPG Highway, with a list price of $11,590 US dollars. These have been in Europe for some time now.The "Smart" name comes from combining the S from Swatch, the M from Mercedes and ART. The car was designed byNicholas Hayek, founder of the SWATCH wristwatch line, and manufactured by Daimler, who also makes Mercedes cars.
We have many communities here in Tucson that people drive street-legal golf carts. People don't realize but bothelectric and electric/gas hybrid golf carts have been around for a long time. Some of the nicer golf carts run forabout $7,000 US dollars, with a shelf on the back that can hold two sets of golf clubs, or groceries.Of course, you would never take a golf cart on the highway, so that is where the Smart car comes in, with a 10gallon tank, could easily get you from one major city to another.
Like golf carts, the Smart-for-Two model being sold in the US will hold only two people, which is perfect for manyAmerican families. The standard 4-person or 5-person sedan is too big for most DINKS (Dual Income, No Kids), and other families with kids often opt for the 7-person SUV instead.
It is good to see that energy consumption is finally getting the attention it deserves. IBM recently announced some exciting offerings to help data centers manage their energy consumption:
- IBM Systems Director Active Energy Manager V3.1 [AEM]:
A new, key component of IBM's [Cool Blue portfolio] offering, AEM helps clients manage and even potentially lower energy costs. According to Gartner, insufficient power and excessive heat remain the greatest challenges in the data center. With AEM, IT managers can understand exact power/cooling costs, manage the efficiency of the current environment and reduce energy costs. AEM is the only energy management software tool that can provide clients with a single view of the actual power usage across multiple IBM platforms, including x86, blades, Power and storage systems, with plans to extend support to the mainframe.
- IBM Usage and Accounting Manager Virtualization Edition V7.1 [UAV]for System p and System x:
UAV gives IT managers more information to manage data center costs. These powerful usage management tools are designed to accurately measure, analyze, and report resource utilization of virtualized/consolidated/shared resources. With UAV, IT managers can better manage costs and justify new systems by determining who is using how much of which resource; assessing the cost of an IT service or application; and accurately charging each user or department. Working with AEM capabilities, it will also allow tracking of energy consumption costs by server and by user. This level of reporting eliminates a key inhibitor to the adoption of virtualization and consolidation and further differentiates IBM systems.
- IBM Tivoli Usage and Accounting Manager[UAM]:
This solution -- ideal for heterogenous IT shops -- serves as an accurate measurement tool underlying billing processes and SLA compliance. UAM provides usage-based accounting and charging for virtually any IT resources across the enterprise -- ranging from mainframes to virtualized servers to storage networks and more. The Usage and Accounting Manager Virtualization offerings seamlessly integrate into it.
Whether you are trying to reduce energy consumption in your data center, or in your transportation around town, these innovations can help you stay "green".
technorati tags: Smart Car, USAToday, golf cart, street legal, hybrid, MPG, green, energy, IBM Systems, Director, AEM, UAV, UAM, TUAM, SLA, management, virtualization, DINKS, SUV
A few weeks ago, my Tivo(R) digital video recorder (DVR) died. All of my digital clocks in my house were flashing 12:00 so I suspect it wasa power strike while I was at the office. The only other item to die was the surge protector,and so it did what it was supposed to do, give up its own life to protect the rest of myequipment. Although somehow, it did not protect my Tivo.
I opened a problem ticket with Sony, and they sent me instructions on how to send itover to another state to get it repaired.Amusingly, the instructions included "Please make a backup of the drive contents beforesending the unit in for repair." Excuse me? How am I supposed to do that, exactly?
My model has only a single 80GB drive, and so my friend and I removed the drive and attachedit to one of our other systems to see if anything was salvageable. It failed every diagnostictest. There was just not enough to read to be usable elsewhere.
This is typical of many home systems. They are not designed for robust usage, high availability, nor any form of backup/recovery process. Some of the newer models havetwo drives in a RAID-1 mode configuration, but most have many single points of failure.
And certainly, it is not mission critical data. Life goes on without the last few episodesof Jack Bauer on "24", or the various Food Network shows that I recorded for items I planto bake some day. For the past few weeks, I have spent more time listening to the radioand reading books. Somehow, even though my television runs fine without my Tivo, watchingTV in "real time" just isn't the same.
I suspect that if you gave someone a method to do the backup, most would not bother to useit. People are now relying more and more heavily on their home-basedinformation storage systems, digital music, video and cherished photographs. Perhaps experiencing a "loss" will help them appreciate backup/recovery systems so much more than they do today.
technorati tags: Tivo, Digital Video Recorder, DVR, RAID, backup, recovery, loss, information, storage, systems[Read More]
A recent blog by Chris Mellor makes the outlandish conspiracy theory that IBM and HDS copied virtualisation technology
from small start-up company DataCore
(Chris doesn't actually name who is his source making such a claim, whether thatsomeone was employed by any of the parties involved at the time the events occurred,or is currently employed by a competitor like EMC bitterly jealous of the success IBM and HDScurrently enjoy with their offerings.)
As I already posted before about IBM'slong history of storage virtualization, SAN Volume Controller was really part of a sequence of major product in this area, after the successful 3850 MSS and 3494 VTS block virtualization products.
In the late 1990's, our research teams in Almaden, California and Hursley, UK were exploring storagetechnologies that could take advantage of commodity hardware parts and the industry-leadingLinux operating system.
As is often the case, while IBM was working on "the perfect product", small start-ups announce "not-yet-perfect" products into the marketplace. Tactical moves like partneringwith DataCore was a smart move, for the following reasons:
- Helps identify market segments. Identify which subset of customers would most benefit fromdisk virtualization. While our 3850 MSS and 3494 VTS were focused on mainframe customers, this newtechnology was focused on distributed Unix, Windows and Linux servers.
- Helps prioritize market requirements. What are the most appealing features?What drives clients to buy disk virtualization for distributed systems platforms?
- Helps evaluate packaging options. Should we deliver pure software and expect customersto purchase their own servers? Should we offer this as a "service offering" with installation anddeployment services included? Should we offer this as hardware with software pre-installed?
The partnership proved worthwhile, not just to prove to IBM that this was a worthwhile market to enter, but also how "NOT" to package a solution. Specifically, DataCore SANsymphony was software that you had to install on your own Windows-based server. The client was left with the task of orderinga suitable Intel-based server, with the right amount of CPU cycles, RAM and host bus adapter ports,and configure the Windows operating system and DataCore software.
It didn't go well. Basically, customers were expected to be their own "hardware engineers", having to knowway too much about storage hardware and software to design a combination that worked for theirworkloads. Most clients were disappointed with the amount of effort involved, and the resulting poor performance.
To fix this, IBM delivered the SAN Volume Controller, with an optimized Linux operating system and internally-writtensoftware that runs on IBM System x(tm) server hardware optimized for performance.
I can't speak for HDS, but I suspect they came to similar conclusions that resulted in a similar decisionto build their product in-house. I welcome Hu Yoshida to correct me if I am wrong on this.
technorati tags: Chris Mellor, DataCore, SANsymphony, IBM, SVC, HDS, EMC, Invista, disk, storage, virtualization, Hu Yoshida, Windows, Linux[Read More]
What a great way to wrap up another excellent week!
While I was away on vacation last week, IBM Storage and Software Offeringswon Brand Impact 2007 Awardsfrom leading brand marketing organization Liquid Agencyat the Brand Summit Awards Dinner.Other awards went to Cisco, Google and Sony, which I also highly admire.
For those in the USA, next Monday isMemorial Day. I'll be in Australia, and they have a similar ANZAC Day which happened last month (April 25).
Have a safe weekend!
technorati tags: IBM, storage, software, awards, brand, impact, Liquid Agency, dinner, Cisco, Google, Sony, Memorial Day, ANZAC
In Storage Technology News, Marc Staimer makes hisSeven network storage predictions for 2007
. Let's take a closer look at each one.
- Federal Rules for Civil Procedures (FRCP) will increase adoption of unstructured data classification, email archive systems and CAS.
CAS continues to flounder, but the rest I can agree with. Regulations are being adopted world wide. Japan has its own Sarbanes-Oxley (SOX) style legislation go into effect in 2008.IBM TotalStorage Productivity Center for Data is a great tool to help classify unstructured file systems. IBM CommonStore for email supports both Microsoft Exchange and Lotus Domino, and can be connected to IBM System Storage DR550 for compliance storage.
- Unified storage systems (combined file and block storage target systems) will become increasingly attractive in 2007, because of their ease of use and simplicity.
I agree with this one also. Our sales of IBM N series in 2006 was great, and looking to continue its strong growth in 2007. The IBM N series brings together FCP, iSCSI and NAS protocols into one disk system. With the SnapLock(tm) feature, N series can store both re-writable data, as well as non-erasable, non-rewriteable data, on the same box. Combine the N series gateway on the front-end with SAN Volume Controller on the back-end, and you have an even more powerful combination.
- Distributed ROBO backup to disk will emerge as the fastest growing data protection solution in 2007.
IDC had a similar prediction for 2006. ROBO refers to "Remote Office/Branch Office", and so ROBO backup deals with how to back up data that is out in the various remote locations. Do you back it up locally? or send it to a central location?Fortunately, IBM Tivoli Storage Manager (TSM) supports both ways, and IBM has introduced small disk and tape drives and auto-loaders that can be used in smaller environments like this. I don't know whether "backup to disk" will be the fastest growing, but I certainly agree that a variety of ROBO-related issues will be of interest this year.
- 2007 will be remembered as the year iSCSI SAN took off because of the much reduced pricing for 10 Gbit iSCSI and the continued deployment of 10 Gbit iSCSI targets.
While I agree that iSCSI is important, I can't say 2007 will be remembered for anything.We have terrible memory in these things. Ask someone what year did Personal Computers (PC) take off, and they will tell you about Apple's famous 1984 commercial. Ask someone when the Internet took off, cell phones took off, etc, and I suspect most will provide widely different answers, but most likely based on their own experience.
For the longest time, I resisted getting a cell phone. I had a roll of quarters in my car, and when I needed to make a call, I stopped at the nearby pay-phone, and made the call. In 1998, pay phones disappeared. You can't find them anymore. That was the year of the cell phones took off, at least for me.
Back to iSCSI, now that you can intermix iSCSI and SAN on the same infrastructure, either through intelligent multi-protocol switches available from your local IBM rep, or through an N series gateway, you can bring iSCSI technology in slowly and gradually. Low-cost copper wiring for 10 Gbps Ethernet makes all this very practical.
Another up-and-coming technology is AoE, or ATA-over-Ethernet. Same idea as iSCSI, but taken down to the ATA level.
- CDP will emerge as an important feature on comprehensive data protection products instead of a separate managed product.
Here, CDP stands for Continuous Data Protection. While normal backups work like a point-and-shoot camera, taking a picture of the data once every midnight for example. CDP can record all the little changes like a video camera, with the option to rewind or fast-forward to a specific point in the day. IBM Tivoli CDP for Files, for example, is an excellent complement to IBM Tivoli Storage Manager.
The technology is not really new, as it has been implemented as "logs" or "journals" on databases like DB2 and Oracle, as well as business applications like SAP R/3.
The prediction here, however, relates to packaging. Will vendors "package" CDP into existing backup products, possibly as a separately priced feature, or will they leave it as a separate product that perhaps, like in IBM's case, already is well integrated.
- The VTL market growth will continue at a much reduced rate as backup products provide equivalent features directly to disk. Deduplication will extend the VTL market temporarily in 2007.
VTL here refers to Virtual Tape Library, such as IBM TS7700 or TS7510 Virtualization Engine. IBM introduced the first one in 1997, the IBM 3494 Virtual Tape Server, and we have remained number one in marketshare for virtual tape ever since. I find it amusing that people are now just looking at VTL technology to help with their Disk-to-Disk-to-Tape (D2D2T) efforts, when IBM Tivoli Storage Manager has already had the capability to backup to disk, then move to tape, since 1993.
As for deduplication, if you need the end-target box to deduplicate your backups, then perhaps you should investigatewhy you are doing this in the first place? People take full-volume backups, and keep to many copies of it, when a more sophisticated backup software like Tivoli Storage Manager can implement backup policies to avoid this with a progressive backup scheme. Or maybe you need to investigate why you store multiple copies of the same data on disk, perhaps NAS or a clustered file system like IBM General Parallel File System (GPFS) could provide you a single copy accessible to many servers instead.
The reason you don't see deduplication on the mainframe, is that DFSMS for z/OS already allows multiple servers to share a single instance of data, and has been doing so since the early 1980s. I often joke with clients at the Tucson Executive Briefing Center that you can run a business with a million data sets on the mainframe, but that there wereprobably a million files on just the laptops in the room, but few would attempt to run their business that way.
- Optical storage that looks, feels and acts like NAS and puts archive data online, will make dramatic inroads in 2007.
Marc says he's going out on a limb here, and that's good to make at least one risky prediction. IBM used to have anoptical library emulate disk, called the IBM 3995. Lack of interest and advancement in technology encouraged IBM to withdraw it. A small backlash ensued, so IBM now offers the IBM 3996 for the System p and System i clients that really, really want optical.
As for optical making data available "online", it takes about 20 seconds to load an optical cartridge, so I would consider this more "nearline" than online. Tape is still in the 40-60 second range to load and position to data, so optical is still at an advantage.
Optical eliminates the "hassles of tape"? Tape data is good for 20 years, and optical for 100 years, but nobody keeps drives around that long anyways. In general, our clients change drives every 6-8 years, and migrate the data from old to new. This is only a hassle if you didn't plan for this inevitable movement. IBM Tivoli Storage Manager, IBM System Storage Archive Manager, and the IBM System Storage DR550 all make this migration very simple and easy, and can do it with either optical or tape.
The Blue-ray vs. DVD debate will continue through 2007 in the consumer world. I don't see this being a major player in more conservative data centers where a big investment in the wrong choice could be costly, even if the price-per-TB is temporarily in-line with current tape technologies. IBM and others are investing a lot of Research and Development funding to continue the downward price curve for tape, and I'm not sure that optical can keep up that pace.
Well, that's my take. It is a sunny day here in China, and have more meetings to attend.
technorati tags: IBM, FRCP, SOX, TotalStorage, Productivity Center, Microsoft, Exchange, Lotus, Domino, DR550, SnapLock, unified storage, NAS, iSCSI, FCP, ROBO, Tivoli, Storage Manager, TSM, Ethernet, AoE, CDP, DB2, Oracle, SAP, VTL, TS7700, TS7510, GPFS, DFSMS, Optical, 3995, 3996, Blue-Ray, D2D2T,DVD
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
- According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
- IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
technorati tags: IBM, Japan, Daily Yomiuri, double happy, wedding, shotgun, Dave Barry, Softek, TDMF, z/OS, Fujitsu
Continuing on my theme of storage area networking, today I thought I would coverthe concept of convergence. This is the notion of disparate things that come together.
Convergence plays a big role in Apple's new iPhone.ExpatJane has a nicecollection of news articles.Gizmodo has a two part hands-on experience of the iPhone hereand here. Seth Godin opines that theiPhone is not for everyone.
I would fall into the "not for me" category, at least at this time. The iPhone is GSM-capable phone with the ability to store 4GB or 8GB of music, photos and video, and has incorporated a 2 megapixel camera. Currently, I have separate components:
- A cell phone that is GSM plus CDMA, with features like "speakerphone" which I use quite a lot, but NO camera.
- A 7 megapixel camera, also very small, with removable memory cards.
- A 60GB iPod, with music and photos. My model is older and doesn't handle videos.
Since I visit government agencies, research and development labs, and other places that don't allow cameras, I have to either chose a cell phone that does not have camera capability in it, or have a camera phone that I leave behind in the car or at the front desk. I have chosen to get cell phones with NO camera. So, NOT having a camera is a primary feature I look for, but this is getting harder and harder these days. I don't know if Apple plans to have a non-camera version of their iPhone, but that would be a deal-breaker for me.
I do carry a separate camera, and where it is permissible, use it separately. This is especially useful if you do a lot of whiteboard or flipchart presentations, and want to capture what you have written for later. (For a great example of how effectively whiteboards can be used, check out these videos from UPS.)A picture is worth a thousand words, and is easier to convey an idea with pictures, especially in countries that may not speak English. Last month, I got a 7 megapixel camera to replace my 5 megapixel. For my work, 2 megapixel as found in the iPhone is not detailed enough.
As for my iPod, I enjoy that I can carry 60GB of music and photos. When I go on vacations, I can bring my camera and iPod, and connect the two, transferring and viewing the pictures that I take. I can easily free up 5-10 GB of space on my iPod for photos in preparation for a trip, then replace that with music when I am back at home. I also use my iPod as a remote disk drive for my laptop on business trips. Again, the 4GB and 8GB may not be enough for what I need.
Printers were never converged into Personal Computers, but they did have their own convergence. I have a multi-function printer/scanner/fax machine. I used to have separate printer, scanner and fax machines, but now the technology is so inexpensive that it got all combined into one solution.
The same is happening for Storage Area Networking gear.
- Thanks to Fibre Channel, switches and directors can handle both SCSI commands (FCP) and CCW commands (FICON). This allows the mainframe and distrbuted systems to converge their traffic onto a single network, and is less expensive than trying to maintain one network for the mainframes, and another for the distributed platforms.
- On the SCSI side, there are now switches that let you have pluggable ports of different flavors. For example, you can have some ports be Fibre Channel to receive FCP, and other ports to be Ethernet to carry iSCSI. iSCSI is a protocol co-developed between IBM and Cisco to carry SCSI commands over Ethernet. Since most computers already have Ethernet "network interface cards" and most buildings are already wired with an Ethernet infrastructure, this provides a less expensive alternative to Fibre Channel.
- Routers, and combination Router/Switches, can send all the FCP/FICON/iSCSI traffic over various long distances to remote data centers, using either iFCP or FCIP protocols. This is a less expensive alternative to dropping your own private "dark fiber" between the two locations, which often involves negotiating access rights to dig trenches through other people's property.
Which brings me back to Apple's iPhone. One device can make calls, watch video, and download webpages all because the networks have converged into sending all data in "packets". The network just routes packets from one place to another. It doesn't care that a packet is a voice packet, a video packet or a webpage packet. It doesn't matter.
This convergence then lets the convenience of a handheld device serve as the conduit for doing business, potentially replacing the credit card.IBM helped Visa and Nokia join forces to use cell phones as wallets. According to the article...
"Users can pay for groceries and other purchases by swiping a phone over a reader that electronically communicates with a microchip on the phone. Phone owners confirm the purchase with the push of a button and the deal is complete.
The platform is the result of many years of trials around the world and will enable mobile contactless payments, remote payments, person-to-person payments, and mobile coupons."
Now that's convergence I can get excited about!
technorati tags: IBM, SAN, Apple, iPhone, GSM, CDMA, iPod, UPS, whiteboard, FCP, FICON, SCSI, iSCSI, Ethernet, iFCP, FCIP, dark fiber, Visa, Nokia, Cisco , convergence
For those of us in the northern hemisphere, yesterday was this year's Winter Solstice
, representingthe shortest amount of daylight between sunrise and sunset. So today, I thought I would blog on my thoughtsof managing scarcity.
Earlier in my career, I had the pleasure to serve as "administrative assistant" to Nora Denzel for the week at a storage conference. My job was to make her look good at the conference, which if you know Nora, doesn't take much. Later, she left IBM to work at HP, and I gotto hear her speak at a conference, and the one thing that I remember most was her statement that thewhole point of "management" was to manage scarcity, as in not enough money in the budget,not enough people to implement change, or not enough resources to accomplish a task.(Nora, I have no idea where you are today, so if you are reading this, send me a note).
Of course, the flip-side to this is that resources that are in abundance are generallytaken for granted. Priorities are focused on what is most scarce. Let's examine some of theresources involved in an IT storage environment:
- Capacity - while everyone complains that they are "running out of space", the truth is that most external disk attached to Linux, UNIX, or Windows systems contain only 20-40% data. Many years ago, I visitedan insurance company to talk about a new product called IBM Tivoli Storage Manager. This company had 7TB of disk on their mainframe,and another 7TB of disk scattered on various UNIX and Windows machines. In the room were TWO storage admins for
the mainframe, and 45 storage admins for the distributed systems. My first question was "why so many people forthe mainframe, certainly one of you could manage all of it yourself, perhaps on Wednesday afternoons?" Their response was that they acted as eachother's backup, in case one goes on vacation for two weeks. My follow-up question to the rest of the audience was:"When was the last time you took two weeks vacation?" Mainframes fill their disk and tape storage comfortablyat over 80-90% full of data, primarily because they have a more mature, robust set of management software, likeDFSMS.
- Labor - by this I mean skilled labor able to manage storage for a corporation. Some companies I have visitedkeep their new-hires off production systems for the first two years, working only on test or development systemsonly until then. Of course, labor is more expensive in some countries than others. Last year, I was doing a whiteboard session on-site for a client in China, and the last dry-erase pen ran out of ink. I asked for another pen, and they instead sent someone to go re-fill it. I asked wouldn't it be cheaper just to buy another pen, and they said "No, labor is cheap, but ink is expensive." Despite this, China does complain that there is a shortage of askilled IT labor force, so if you are looking for a job, start learning Mandarin.
- Power and Cooling - Most data centers are located on raised floors, with large trunks of electrical power and hugeair conditioning systems to deal with all the heat generated from each machine. I have visited the data centers ofclients that are forced now to make decisions on storage based on power and cooling consumption, because the coststo upgrade their aging buildings are too high. Leading the charge is IBM, with technology advancements in chips, cards, and complete systems that use less power, and generate less heat. While energy is still fairly cheap in the grand scheme of things, fears ofGlobal Warmingand declining oil supplies, the costs ofpower and cooling have gotten some news lately. In 1956, Hubbert predicted US would reach peak oil supplies by1965-1970 (it happened in 1971), and this year Simmonsestimated that world-wide oil production began its decline already in 2005. Smart companies like Google have movedtheir server farms to places like Oregon in the Pacific Northwest for cheaper hydroelectric power.
- Bandwidth - Last year IBM introduced 4Gbps Fibre Channel and FICON SAN networking gear, along with the servers and storage needed to complete the solution. 4Gbps equates to about 400 MB/sec in data throughput. By comparison, iSCSI is typically run on 1Gbps Ethernet, but has so much overheads that you only get abour 80 MB/sec. Next year, we may see both 8 Gbps SAN, and 10 GbE iSCSI, to provide 800 MB/sec throughputs. My experience is that the SAN is not the bottleneck, instead people run out of bandwidth at the server or storage end first. They may not have a million dollars to buy the fastest IBM System p5 servers, or may not have enough host adapters at the storage system end.
- Floorspace - I end with floorspace because it reminds me that many "shortages" are temporary or artificially created. Floorspace is only in short supply because you don't want to knock down a wall, or build a new building, to handle your additional storage requirements.In 1997, Tihamer Toth-Fejel wrote an article for the National Space Society newsletter that estimated that ...Everybody on Earth could live comfortably in the USA on only 15% of our land area, with a population density between that of Chicago and San Francisco. Using agricultural yields attained widely now, the rest of the U.S. would be sufficient to grow enough food for everyone. The rest of the planet, 93.7% of it, would be completely empty.Of course, back in 1997 the world population was only 5.9 billion, and this year it is over 6.5 billion.
This last point brings me back to the concept of food, and I am not talking about doughnuts in the conference room, or pizza while making year-end storage upgrades. I'm talking aboutthe food you work so hard to provide for yourself and your family. The folks at Oxfam came up with a simpleanalogy. If 20 people sit down at your table, representing the world’s population:
- 3 would be served a gourmet, multi-course meal, while sitting at decorated table and a cushioned chair.
- 5 would eat rice and beans with a fork and sit on a simple cushion
- 12 would wait in line to receive a small portion of rice that they would eat with their hands while sitting on the floor.
So for those of you planning a special meal next Monday, be thankful you are one of the lucky three, and hopefulthat IBM will continue to lead the IT industry to help out the other seventeen.
Happy Winter Solstice!
technorati tags: IBM, Northern, Hemisphere, Winter, Solstice, Nora+Denzel, Oxfam, scarcity, Linux, UNIX, Windows, TSM, Tivoli+Storage+Manager, storage, admins, global+warming, climate+change, peak+oil, National+Space+Society, special, meal