IDC, an independent industry analyst firm, put out their 4Q07"Worldwide Disk Storage Systems Quarterly Tracker" report. Here is an excerpts from their [press release
"Worldwide external disk storage systems factory revenues posted 9.8 percent year-over-year growth in the fourth quarter of 2007 (4Q07) and totaling $5.3 billion (USD), according to the IDC Worldwide Disk Storage Systems Quarterly Tracker. For the quarter, the total disk storage systems market grew to $7.5 billion (USD), up 7.6 percent from the prior year's fourth quarter. Total disk storage systems capacity shipped reach 1,645 petabytes, growing 56.3 percent."
For those wondering how an industry could grow 56.3 percent in capacity, but only 7.6 percent in revenue, it isbecause the average dollar-per-GB dropped in 2007 from $6.63 down to $4.56 (USD), representing a 31 percent decline.In the past, disk prices dropped 40 to 60 percent each year, so making single digit growth was the best major vendorscould hope for. However, lately this has slowed down to 25 to 35 percent decline, but the client demand for capacity continues at the 60 percent pace, which means that vendors could achieve double digit revenue growth soon.
Once again, IBM was ranked number 1 in total disk storage. No surprise there. Here are the details:
"Total Disk Storage Systems Market
In the total worldwide disk storage systems market, IBM lead the market with 22.9 percent followed by HP with 18.1 percent revenue share. EMC maintained the third position with 16.0 percent revenue share.
For the full year, the total disk storage systems market posted 6.6 percent growth to $26.3 billion (USD). In the total worldwide disk storage systems market, IBM and HP lead the market in statistical tie with 20.1 percent and 19.4 percent revenue share, respectively. EMC maintained the third position with 15.2 revenue revenue share."
But why focus just on disk? IDC also released their"Worldwide Combined Disk and Tape Storage 3Q07 Market Share Update", and IBM was number one for that as well,taking in 21.9 percent share. Here's a quote of IBM VP Barry Rudolph in[CNN Money]:
"IBM's continued leadership in the storage hardware market reaffirms our strategy to provide the most comprehensive tiered portfolio of storage offerings, ranging from software and services to disk and tape storage solutions," said Barry Rudolph, Vice President, Storage Stack Solutions, IBM. "IBM is the clear choice for providing information infrastructure solutions that offer the most cost-efficient, streamlined approach to help our customers increase overall productivity and maximize performance."
It is looking like 2008 is going to be a good year for IBM!
technorati tags: IBM, IDC, 4Q07, 3Q07, marketshare, market share, EMC, HDS, HP, Sun, NetApp, Dell, disk, tape, systems
No, this is not an announcement about myself moving to Nepal.
My friends over at OLE Nepal are [looking for a Super SysAdmin]willing to live in Nepal for five months and help out with their project to help the students in the localschools there. I think this might be a great opportunity for someone to help changethe world. Those of you who have read my past blog posts about the One Laptop per Child [OLPC], such as [Understanding the LAMP platform] and [Supporting OLPC Schools with LAMP stacks] may understand the type of work involved.
- You dream in Bash
- IPv4, IPv6, Wireless Mesh networking? No problem! You know linux networking inside and out
- Extensive knowledge of BIND, DHCPD, Squid, Apache, security, etc.
- Experience working with [Moodle] would be most excellent (it is basically a PHP web application that maintains MySQL databases for lesson plans, homework assignments and other school related information)
- Adept with Python scripting or could learn it quickly. OLPC has standardized on Python for scripting (although knowledge in Perl and PHP won't hurt either)
- You look to implement a practical solution that less skilled sysadmins can easily maintain over a cooler but more complicated solution.
- You play well with others. You don’t alienate collaborators with rude e-mails that assert your technical superiority (even though you are)
- Your primary concern is meeting the educational needs of kids and teachers. Your rate technical awesomeness a distant second to meeting those critical needs.
I've been working with Dev, Bryan and Sulochan for the past three months (remotely here from Tucson, AZ)but we've come to a point where we need on-site expertise. I will continue to provide remote support.
Given the number of readers who have contacted me over the past year looking for an IT job (or a different job because they are not happy where they are), this could be an amazing experience.
technorati tags: OLE Nepal, OLPC, Bash, Linux, IPv6, Mesh, networking, Squid, Apache, security, Moodle, LAMP, PHP, Perl, Python
It's been a while since I've talked about [Second Life
The latest post on eightbar[Spimes, Motes and Data centers]discusses IBM's use of virtual world technology to analyze data centers in three dimensions.New World Note asks[What's The Point Of 3D Data Centers?]One would think that a simple monitoring tool based on a two-dimensional floor plan would be enough to evaluate a data center.
Enter Michael Osias, IBM (a.k.a Illuminous Beltran in Second Life). Some of the leading news sites havebegun to notice some 3D data centers that he has helped pioneer. UgoTrade writes up an article aboutMichael and the media attention in [The Wizard of IBM's 3DData Centers].
Of course, in presenting these "Real Life/Second Life" (RL/SL) interactive technologies, IBM is sometimes the target of ridicule. Why? Because IBM is 10 years ahead of everyone else. So, are there aspects of a data center where 3D interfaces makes sense? I think there is.
- Topology Viewer
IBM TotalStorage Productivity Center has an awesome "topology viewer" that shows what servers are connectedto which switches, to which disk systems and tape libraries. This is all done in a 2D diagram, generated dynamicallywith data discovered through open standard interfaces, similar to what you might draw manually with toolslike Visio. Imagine, however, howmore powerful if it were a 3D viewer, with virtual equipment mapped to the physical location of each pieceof hardware on the data center floor, including the position on the rack and location on the data center floor.
- Temperature Flow
Designing computer room air conditioning (CRAC) systems is actually a three dimensional problem. Cold air isfed underneath the raised floor, comes up through strategically placed "vent" tiles, taken in the front ofeach rack. Hot air comes out the back of each rack, and hopefully finds ceiling duct intake to get cooled again.The temperature six inches off the floor is different than the temperature six feet off the floor, and 3Dmonitor tools could be helpful in identifying "hot spots" that need attention. In this case "spimes" representsensors in the 3D virtual world, able to report back information to help diagnose problems or monitor events.
- Server consolidation
After many people left the mainframe in favor of running a single application per distributed server, the pendulumhas finally swung back. Companies are discovering the many benefits of changing this behavior. "Re-centralization" is the task at hand. Thanks to virtualization of servers, networks and storage, sharing common resources canonce again claim the benefits of economies of scale. In many cases, servers work together in collective unitsfor specific applications that might benefit better if consolidated together onto the same equipment.
IBM's "New Enterprise Data Center" vision recognizes that people will need to focus on the management aspectsof their IT infrastructure, and 3D virtual world technologies might be an effective way to getthe job done.
technorati tags: secondlife, eightbar, spimes, motes, 3D, data center, virtual world, IBM, TotalStorage, Productivity Center, CRAC, re-centralization, New Enterprise Data Center
I am always amused in the manner the IT industry tries to solve problems. Take, for example, theprocess of backups. The simplest approach is to backup everything, and keep "n" versions of that.Simple enough for a small customer who has only a handful of machines, but does not scale well. Inmy post [Times a Million
],I coined the phrase "laptop mentality", referring to people's inability to think through solutions in large scale.
Apparently, I am not alone.Steve Duplessie (ESG) wrote in his post[Random Thoughts]:
"I may even get to stop yelling at people to stop doing full backups every week on non-changing data (which is 80 %+) just because that's how they used to do it. They won't have a choice. You can't back up 5X your current data the way you do (or don't) today."
Hu Yoshida (HDS) does a great job explaining that thereare three ways to perform deduplication for backups:
- Pre-processing. Have the backup software not backup unchanged data.
- Inline processing. Have an index to filter the output of the backup as it sends data to storage.
- Post-processing. Have the receiving storage detect duplicates and handle them accordingly.
Here's an excerpt from his post[Deduplication Ratios]:
"A full backup of 1TB data base tablespace is taken on day one. The next day another full backup is taken and only 2GB of that backup has any changes.
Using traditional full backup approaches after 2 nights, the backup capacity required is 2 x 1TB = 2TB
One method of calculating de-duplication ratios could yield a low ratio:
- Total de-duplicated backup capacity used = 1TB + 2GB = 1.002TB
- If the de-duplication ratio compares the amount of total physical storage used to the total amount that would have been used by traditional backup methods, the ratio = 2TB / 1.002TB = approximately 2:1
Another method of calculating de-duplication ratios could yield a high ratio:
- Total de-duplicated backup capacity used still = 1.002TB
- If the de-duplication ratio compares the amount of data stored in the most recent (second) backup to the amount that would have been used by traditional backup methods, the ratio 1TB / 2GB = 1000GB / 2GB = 500:1"
While IBM also offers deduplication in the IBM System Storage N series disk systems, I find that for backup, itis often more effective to apply best practices via IBM Tivoli Storage Manager (TSM). Let's take a look at some:
- Exclude Operating System files
Why take full backups of your operating system every day? Yes, deduplication will find a lot to reduce fromthis, but best practices would exclude these. TSM has an include/exclude list, and the default version excludesall the operating system files that would be recovered from "bare machine recovery" or "new system install"procedures. Often, if the replacement machine has different gear inside, your OS backups aren't what you need,and a fresh OS install may determine this and install different drivers or different settings.
- Exclude Application programs
Again, yes if there are several machines running the same application, you probably have opportunity for deduplication. However, unless you match these up with the appropriate registry or settings buried down in theoperating system, recovering just application program files may render an unusable system. Applications are bestinstalled from a common source that are either "pushed" through software distribution, or "pulled" from an application installation space.
If you have TB-sized databases, and are only doing full backups daily to protect it, have I got a solution for you.IBM and others have software that are "application-aware" and "database-aware" enough to determine what haschanged since the last backup and copy only that delta. Taking advantage of the TSM Application ProgrammingInterface (API) allows for both IBM and third party tools to take these delta backups correctly.
- User Files
Which leaves us with user files, which are often unique enough on their own from the files of other users,that would not benefit from file-level deduplication. Backing up changed data only, as TSM does with its patented ["progressive incremental backup"] method, generally gets most of the benefits described by deduplication, without having to purchase storage hardware features.
Of course, if two or more users have identical files, the question might be why these are not stored on acommon file share. NAS file share repositories can greatly reduce each user keeping their own set of duplicates.It is interesting that some block-oriented deduplication,such as that found in the IBM System Storage N series, can get some benefit because some user files are oftenderivatives of other files, and there might be some 4 KB blocks of data in common.
Last November, I visited a customer in Canada. All of their problems were a direct result of taking full backupsevery weekend. It put a strain on their network; it used up too many disk and tape resources; and it took too long tocomplete. They asked about virtual tape libraries, deduplication, and anything else that could help them. The answer was simple: switch to IBM Tivoli Storage Manager and apply best practices.
technorati tags: Steve Duplessie, ESG, Hu Yoshida, HDS, deduplication, N series, application-aware, database-aware, database, tablespace, best practice, Tivoli, Storage Manager, TSM, progressive, incremental, backup
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
- Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
- z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
- Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
- IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
- Basic HyperSwap: single-site high availability for the disk systems only
- GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
- GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
- Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
technorati tags: IBM, z10 EC, DS8000, z/OS Global Mirror, XRC, SDM, CDL, RPO, FICON, dual-core, quad-core, Intel, MIPS, MSU, zAAP, IFL, zIIP, Hyperswap, DB2, Apache, Derby, UCB, VSAM, EAV