Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
IBM wins lots of awards, but this time is unique: [IBM and Fox Networks Group have jointly won an Engineering Emmy® Award] for Innovation from the Academy of Television Arts & Sciences. According to the Academy, by improving the ability of media companies to capture, manage and exploit content in digital form, IBM and Fox have fundamentally changed the way that audio and video content is managed and stored. Here's an excerpt from the IBM Press Release:
"By standardizing technologies in this way, Fox can now use open-standard, file-based tape in all aspects of production, post-production and distribution functions – displacing costly proprietary tape formats and/or disk subsystems. This provides media companies with the consumer equivalent of having their entire library of DVDs online and available at any time, and the ability to go to a specific scene, in any one of the movies, in an instant.
In the early stage of the technology initiative, the IBM/Fox team applied IBM-patented technologies invented by IBM Research for high-speed data movement. They also integrated traditional broadcast transport and encoding standards with IT industry open standards. This allowed either Standard Definition (SD) or HD programming to be available in real time for digital recording and repurposing -- with improved economics."
Unfortunately, people didn't like the name, but they loved the acronym, so it was renamed to Linear Tape File System. IBM offers LTFS single-drive edition on its LTO-5 and TS1140 tape drives, and LTFS library-edition across all of its tape libraries. Since everyone hates proprietary vendor lock-in, IBM has graciously shared LTFS as an open source standard with the rest of the Linear Tape Open consortium.
(Note: I was not there at the awards ceremony. The pictures were taken by Ed Childers, David Pease and Rainer Richter of each other. Additional photos are available on this [Flicr photo album].)
(1) Rainer Richter, Media Technology Market Partners LLC [MTMP], presenting the Emmy to Steve Canepa, IBM General Manager for Media and Entertainment industry. MTMP is an IBM Business Partner that offers integrated solutions for LTO and LTFS, consulting, services, and technology to the media and entertainment industry…
(2) Ed Childers, IBM manager of the Tape Drive Development team, holding the Emmy. Fellow IBM blogger Steve Hamm credits Ed on coming up with the idea for LTFS seven years ago, in his blog post [Coding and Loading in Las Vegas: How a Team of IBM Researchers Helped Transform the Way Video is Stored]. Ed wanted to make tape storage easier to use and to integrate it into the workflow of networks and studios, and suggested using an indexing system that would allow people to write software that would make video more accessible.
(3) David Pease, IBM Senior Technical Staff Member from the IBM Almaden Research Center, holding the Emmy. Along with Lucas Villa Real (IBM Brazil) and Michael Richmond (IBM Almaden), David and his team were able to come up with a working prototype in just four months. Michael discusses this in his posts [Tape? Does anyone care about Tape anymore?"] and [the Emmy goes to... LTFS].
Of course, Technology is only worthwhile if you put it to use. Our friends at FOX initially partnered with IBM to develop this video archive solution for the National Football League (NFL). If there is one place that "re-purposes" a lot of video footage, it is sports television. The technology proved so useful that FOX has since expanded it to other types of programming.
Well, it's 2008, which could mark the end to RAID5 and mark the beginnings of a new disk storagearchitecture. IBM starts the year with exciting news, acquiring new disk technology from a smallstart-up called XIV, led by former-EMCer Moshe Yanai. Moshe was ousted publicly in 2001 from hisposition as EMC's VP of engineering, and formed his own company. It didn't take long for EMC bloggersto poke fun at this already. Mark Twomey, in his StorageZilla blog, had mentioned XIV before back in August,[XIV], and again todayin [IBM Buys XIV].
To address the new requirements associated with next generation digital content, IBM chose XIV and its NEXTRA™ architecture for its ability to scale dynamically, heal itself in the event of failure, and self-tune for optimum performance, all while eliminating the significant management burden typically associated with rapid growth environments. The architecture also is designed to automatically optimize resource utilization of all the components within the system, which can allow for easier management and configuration and improved performance and data availability.
"We are pleased to become a significant part of the IBM family, allowing for our unique storage architecture, our engineers and our storage industry experience to be part of IBM's overall storage business," said Moshe Yanai, chairman, XIV. "We believe the level of technological innovation achieved by our development team is unparalleled in the storage industry. Combining our storage architectural advancements with IBM's world-wide research, sales, service, manufacturing, and distribution capabilities will provide us with the ability to have these technologies tackle the emerging Web 2.0 technology needs and reach every corner of the world."
The NEXTRA architecture has been in production for more than two years, with more than four petabytes of capacity being used by customers today.
Current disk arrays were designed for online transaction processing (OLTP) databases. The focus was onusing fastest most expensive 10K and 15K RPM Fibre Channel drives, with clever caching algorithmsfor quick small updates of large relational databases. However, the world is changing, and peoplenow are looking for storage designed for digital media, archives, and other Web 2.0 applications.
One problem that NEXTRA architecture addresses is RAID rebuild. In a standard RAID5 6+P+S configuration of 146GB 10K RPM drives, the loss of one disk drive module (DDM) was recovered by reconstructing the data from parity of the other drives onto the spare drive. The process took46 minutes or longer, depending on how busy the system was doing other things. During this time,if a second drive in the same rank fails, all 876GB of data are lost. Double-drive failures are rare,but unpleasant when they happen, and hopefully you have a backup on tape to recover the data from.Moving to slower, less expensive SATA drives made this situation worse. The drives have highercapacity, but run at slower speeds. When a SATA drive fails in a RAID5 array, it could take severalhours to rebuild, and that is more time exposure for a second drive failure. A rebuild for a 750GBSATA drive would take five hours or more,with 4.5 TB of data at risk during the process if a second drive failure occurs.
The Nextra architecture doesn't use traditional RAID ranks or spare DDMs. Instead, data is carved up into 1MBobjects, and each object is stored on two physically-separate drives. In the event of a DDM loss, allthe data is readable from the second copies that are spread across hundreds of drives. New copies aremade on the empty disk space of the remaining system. This process can be done for a lost 750GB drive in under20 minutes. A double-drive failure would only lose those few objects that were on both drives, so perhaps1 to 2 percent of the total data stored on that logical volume.
Losing 1 to 2 percent of data might be devastating to a large relational database, as this could impactthe entire access to the internal structure. However, this box was designed for unstructuredcontent, like medical images, music, videos, Web pages, and other discrete files. In the event of a double-drivefailure, individual files would be recovered, such as with IBM Tivoli Storage Manager backup software.
IBM will continue to offer high-speed disk arrays like the IBM System Storage DS8000 and DS4800 for OLTP applications, and offer NEXTRA for this new surge in digital content of unstructured data. Recognizing this trend, diskdrive module manufacturers will phase out 10K RPM drives, and focus on 15K RPM for OLTP, and low-speedSATA for everything else.
Update: This blog post was focused on the version of XIV box available as of January 2008 that was built by XIV prior to the IBM acquisition. IBM has since made a major revision, made available August 2008 thataddresses a variety of workloads, including database, OLTP, email, as well as digital content and unstructuredfiles. Contact your IBM or IBM Business Partner for the latest details!
Bottom line, IBM continues to celebrate the new year, while the EMC folks in Hopkington, MA will continue to nurse their hangovers. Now that's a good way to start the new year!
This week, July 26-30, 2010, I am in Washington DC for the annual [2010 System Storage Technical University]. As with last year, we have joined forces with the System x team. Since we are in Washington DC this time, IBM added a "Federal Track" to focus on government challenges and solutions. So, basically, offering attendees the option to attend three conferences for one low price.
This conference was previously called the "Symposium", but IBM changed the name to "Technical University" to emphasize the technical nature of the conference. No marketing puffery like "Journey to the Private Cloud" here! Instead, this is bona fide technical training, qualifying attendees to count this towards their Continuing Professional Education (CPE).
(Note to my readers:The blogosphere is like a playground. In the center are four-year-olds throwing sand into each other's faces, while mature adults sit on benches watching the action, and only jumping in as needed. For example, fellow blogger Chuck Hollis (EMC) got sand in his face for promising to resign if EMC ever offered a tacky storage guarantee, and then [failed to follow through on his promise] when it happened.
Several of my readers asked me to respond to another EMC blogger's latest [fistful of sand].
A few months ago, fellow blogger Barry Burke (EMC) committed to [stick to facts] in posts on his Storage Anarchist blog. That didn't last long! BarryB apparently has fallen in line with EMC's over-promise-then-under-deliver approach. Unfortunately, I will be busy covering the conference and IBM's robust portfolio of offerings, so won't have time to address BarryB's stinking pile of rumor and hearsay until next week or later. I am sorry to disappoint.)
This conference is designed to help IT professionals make their business and IT infrastructure more dynamic and, in the process, help reduce costs, mitigate risks, and improve service. This technical conference event is geared to IT and Business Managers, Data Center Managers, Project Managers, System Programmers, Server and Storage Administrators, Database Administrators, Business Continuity and Capacity Planners, IBM Business Partners and other IT Professionals. This week will offer over 300 different sessions and hands-on labs, certification exams, and a Solutions Center.
For those who want a quick stroll through memory lane, here are my posts from past events:
In keeping up with IBM's leadership in Social Media, IBM Systems Lab Services and Training team running this event have their own [Facebook Fan Page] and
[blog]. IBM Technical University has a Twitter account [@ibmtechconfs], and hashtag #ibmtechu. You can also follow me on Twitter [@az990tony].
Fellow Blogger BarryB mentions "chunk size" in his post [Blinded by the light],as it relates to Symmetrix Virtual Provisioning capability. Here is an excerpt:
I mean, seriously, who else but someone who's already implemented thin provisioning would really understand the implications of "chunk" size enough to care?
For those of you who don't know what the heck "chunk size" means (now listen up you folks over at IBM who have yet to implement thin provisioning on your own storage products), a "chunk" is the term used (and I think even trademarked by 3PAR) to refer to the unit of actual storage capacity that is assigned to a thin device when it receives a write to a previously unallocated region of the device.
For reference, Hitachi USP-V uses I think a 42MB chunk, XIV NEXTRA is definitely 1MB, and 3PAR uses 16K or 256K (depending upon how you look at it).
Thin Provisioning currently offered in IBM System Storage N serieswas technically "implemented" by NetApp, and that the Thin Provisioning that will be offered in our IBM XIV Nextrasystems will have been acquired from XIV. Lest I remind you that many of EMC's products were developed by other companies first, then later acquired by EMC, so no need for you to throw rocks from your glass houses in Hopkington.
"Thin provisioning" was first introduced by StorageTek in the 1990's and sold by IBM under the name of RAMAC Virtual Array (RVA). An alternative approach is "Dynamic Volume Expansion" (DVE). Rather than giving the host application a huge 2TB LUN but actually only use 50GB for data, DVE was based on the idea that you only give out 50GB they need now, but could expand in place as more space was required. This was specifically designed to avoid the biggest problem with "Thin Provisioning" which back then was called "Net Capacity Load" on the IBM RVA, but today is now referred to as "over-subscription". It gave Storage Administrators greater control over their environment with no surprises.
In the same manner as Thin Provisioning, DVE requires a "chunk size" to work with. Let's take a look:
On the DS4000 series, we use the term "segment size", and indicate that the choice of a segment size can have some influence on performance in both IOPS and throughput. Smaller segment sizes increase the request rate (IOPS) by allowing multiple disk drives to respond to multiple requests. Large segment sizes increase the data transfer rate(Mbps) by allowing multiple disk drives to participate in one I/O request. The segment size does not actually change what is stored in cache, just what is stored on the disk itself.It turns out in practice there is no advantage in using smaller sizes with RAID 1; only in a few instances does this help with RAID-5 if you can writea full stripe at once to calculate parity on outgoing data. For most business workloads, 64KB or 128KB are recommended. DVE expands by the same number of segments across all disks in the RAID rank, so for example in a 12+P rank using 128KB segment sizes, the chunk size would be thirteen segments, about 1.6MB in size.
SAN Volume Controller
On the SAN Volume Controller, we call this "extent size" and allow it to be various values 64MB to 512MB. Initially,IBM only managed four million extents, so this table was used to explain the maximum amount that could be managedby an SVC system (up to 8 nodes) depending on extent size selected.
IBM thought that since we externalized "segment size" on the DS4000, we should do the same for the SANVolume Controller. As it turned out, SVC is so fast up in the cache, that we could not measure any noticeable performance difference based on extent size. We did have a few problems. First, clients who chose 16MB andthen grew beyond the 64TB maximum addressable discovered that perhaps they should have chosen something larger.Second, clients called in our help desk to ask what size to choose and how to determine the size that was rightfor them. Third, we allowed people to choose different extent sizes per managed disk group, but that preventsmovement or copies between groups. You can only copy between groups that use the same extent size. The generalrecommendation now is to specify 256MB size, and use that for all managed disk groups across the data center.
The latest SVC expanded maximum addressability to 8PB, still more than most people have today in their shops.
Getting smarter each time we introduce new function, we chose 1GB chunks for the DS8000. Based on a mainframebackground, most CKD volumes are 3GB, 9GB, or 27GB in size, and so 1GB chunks simplified this approach. Spreadingthese 1GB chunks across multiple RAID ranks greatly reduced hot-spots that afflict other RAID-based systems.(Rather than fix the problem by re-designing the architecture, EMC will offer to sell you software to help you manually move data around inside the Symmetrix after the hot-spot is identified)
Unlike EMC's virtual positioning, IBM DS8000 dynamic volume expansion does work on CKD volumes for our System z mainframe customers.
The trade-off in each case was between granularity and table space. Smaller chunks allow finer control on the exact amount allocated for a LUN or volume, but larger chunks reduced the number of chunks managed. With our advanced caching algorithms, changes in chunk size did not noticeably impact performance. It is best just to come up with a convenient size, and either configure it as fixed in the architecture, or externalize it as a parameter with a good default value.
Meanwhile, back at EMC, BarryB indicates that they haven't determined the "optimal" chunk size for their newfunction. They plan to run tests and experiments to determine which size offers the best performance, and thenmake that a fixed value configured into the DMX-4. I find this funny coming from the same EMC that won't participate in [standardized SPC benchmarks] because they feel that performance is a personal and private matter between a customer and their trusted storage vendor, that all workloads are different, and you get the idea. Here's another excerpt:
Back at the office, they've taking to calling these "chunks" Thin Device Extents (note the linkage back to EMC's mainframe roots), and the big secret about the actual Extent size is...(wait for it...w.a.i.t...for....it...)...the engineers haven't decided yet!
That's right...being the smart bunch they are, they have implemented Symmetrix Virtual Provisioning in a manner that allows the Extent size to be configured so that they can test the impact on performance and utilization of different sizes with different applications, file systems and databases. Of course, they will choose the optimal setting before the product ships, but until then, there will be a lot of modeling, simulation, and real-world testing to ensure the setting is "optimal."
Finally, BarryB wraps up this section poking fun at the chunk sizes chosen by other disk manufacturers. I don't knowwhy HDS chose 42MB for their chunk size, but it has a great[Hitchiker's Guide to the Galaxy]sound to it, answering the ultimate question to life, the universe and everything. Hitachi probably went to theirDeep Thought computer and asked how big should their "chunk size" be for their USP-V, and the computer said: 42.Makes sense to me.
I have to agree that anything smaller than 1MB is probably too small. Here's the last excerpt:
Now, many customers and analysts I've spoken to have in fact noted that Hitachi's "chunk" size is almost ridiculously large; others have suggested that 3PAR's chunks are so small as to create performance problems (I've seen data that supports that theory, by the way).
Well, here's the thing: the "right" chunk size is extremely dependent upon the internal architecture of the implementation, and the intersection of that ideal with the actual write distribution pattern of the host/application/file system/database.
So my suggestion to EMC is, please, please, please take as much time as you need to come up with the perfect"chunk size" for this, one that handles all workloads across a variety of operating systems and applications, from solid-state Flash drives to 1TB SATA disk. Take months or years, as long as it takes. The rest of the world is in no hurry, as thin provisioning or dynamic volume expansion is readily available on most other disk systems today.
Maybe if you ask HDS nicely, they might let you ask their computer.
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
Well it's Wednesday, and you know what that means... IBM Announcements.
(Normally, announcements are on Tuesdays, but we moved this one over to Wednesday to line up with our big launch event in Pinehurst, NC. )
A lot was announced today, so I decided to break it up into several separate posts. I will start with our Enterprise Systems: DS8870, TS7700 Release 3, and XIV Gen3.
Enterprise systems are the servers, storage and software at the core of an enterprise IT infrastructure. Enterprise systems enable a private cloud infrastructure at enterprise scale, with flexible service delivery models that provide dynamic efficiency for resource and workload management. They make sure critical data is always available across the enterprise, making it accessible in new ways so that actionable insights can be derived from advanced and operational analytics. They also provide ultimate security, ensuring the integrity of critical data while mitigating risk and providing assured compliance.
IBM System Storage DS8870® disk system
This new storage system is the next generation in IBM's DS8000 series, based on IBM's POWER7 chipset. Each CEC can have 2, 4, 8 or 16 cores. Like the DS8800, you can have a mix of 2.5-inch and 3.5-inch disk drives of different speeds and capacities, up to 1,536 drives in a four-frame configuration. The maximum cache is now 1TB usable. The combination of faster chipset and more cache can triple performance for some workloads!
All DS8870s ship standard with all Full Disk Encryption (FDE-capable) drives. The problem in the past was that people would buy DS8000 with non-FDE drives, and then later want to activate encryption, and discovered that they have to swap out their drives with those with the encryption chip built in. Now, all drives on the DS8870 will have the encryption chip. This also allows Easy Tier sub-volume automated tiering to move encrypted data between all media types.
Flash optimization with DS8000 Easy Tier can improve performance up to 3 times with 3% of data on solid-state storage. Easy Tier is easy to deploy and runs automatically.
Support of the American National Standards Institute's (ANSI) T10 Data Integrity Field (DIF) standard. This is a feature that the mainframe has had for years, and is now being extended to distributed operating systems. The concept is simple. When sending data between server and storage, generate a checksum at the source, and then validate the checksum at the target. When you write a block of data, the server generates the checksum, and the DS8870 validates the checksum on arrival. When you read the data back, the DS8870 generates the checksum, and the server validates it on arrival. This ensures that data was not corrupted in between. There is a great write-up on IBM developerWorks: [End-to-end data protection using T10 standard data integrity field].
Energy Efficient. The DS8870 consumes less energy than its predecessor, the DS8800. For example, a fully-configured four-frame DS8870 with 1,536 disk drives consumes only 23.2kW, compared to the same number of drives in a DS8800 consumed 26.3 kW. By comparison, the DS8700 with five frames and 1,024 drives consumed 29.2kW.
Support for new System z load balancing algorithm. System z Workload Manager now interacts with the DS8870 I/O Priority Manager to optimize designated Quality of Service (QoS) levels. We have also the fastest operational analytics solution with DB2 list Prefetch cache optimization with DS8870 High Performance FICON (zHPF) integration. This solution increases DB2 query performance up to 11 times with disk, and up to 60 times with solid-state drives (SSD). File scans are up to 30 percent faster using DS8870 zHPF support for sequential access methods (QSAM, BPAM, and BSAM).
VMware vStorage APIs for Array Integration (VAAI) support. Why should the IBM DS8800 series support VMware when IBM already offers great VMware support with SAN Volume Controller (SVC), Storwize V7000 and XIV storage sytsems? Good question. This was hotly debated between development and marketing. Several DS8000 customers have already added SVC to provide full VMware VAAI support. As a consultant, I am neither development nor marketing, but felt it necessary to weigh in on my opinion on this. The DS8000 is a consolidation platform. According to one analyst survey, 22 percent of companies run on a single disk platform, so for DS8000 to be the one, it needs to support VMware and exploit these special APIs.
Six Nines Availability. Critical enterprise systems need to deliver continuous data availability, or very close to it. IBM solutions can help deliver up to six “nines” of availability, or 99.9999 percent when combining DS8000 Metro Mirror and GDPS Hyperswap. That's less than 30 seconds of downtime per year.
The TS7700 Release 3 represents a refresh to our existing virtual tape libraries. These are mainframe-only, offered in two models: TS7720 is a disk-only device, and the TS7740 is a blended disk-and-tape solution.
Industry standard hardware encryption. This applies to user data stored on the TS7700 system cache (disk), and for data transferred between TS7700 systems. This is especially important for regulations, like Payment Card Industry Data Security Standard (PCI-DSS). In previous models, the data would not be encrypted until it was moved off disk and written to tape. Now, it is encrypted the minute in lands on the disk cache, and stays encrypted as it is replicated from one TS7700 to another in the grid.
Up to 4 Million logical volume capacity. This is twice the previous support.
More physical capacity for TS7720 systems. The maximum capacity for the disk-only model is raised from 440TB to 620TB, representing a 40 percent increase.
My latest book "Inside System Storage: Volume V" is now available!
I have published my fifth volume in my "Inside System Storage" series! Currently, it is only available in Paperback. My editor, Susan Pollard, is hoping to have the eBook and Hardcover versions ready for Cyber Monday. The foreword was written by my Dr. Sondra Ashmore.
You can order this, and all my other books, in all formats, directly from my [Author Spotlight] page. The paperback will also be available soon from other online booksellers, search for ISBN 978-1-300-26223-7.
Improved Scalability. A new Multi-system Manager (MSM) server reduces the operational complexity for large and multi-site XIV deployments. Previously, admins connected directly to XIV boxes. If you had 10 admins logged in, then every XIV box was managing 10 admin conversations. The new MSM acts as a go-between. The admins connect to the MSM, and the MSM connects to the XIV boxes. The MSM polls and caches the status of each XIV, greatly increasing the number of XIV boxes that an admin can manage.
Enhanced User Interface. A new Multi-system Manager server reduces the operational complexity for large and multi-site XIV deployments. We also added support for IPsec and US. Government (USGv6) certification for admistering the XIV over IPv6 networks. The XIV Mobile Dashboard app for iPhone and iPad is spiffed up. Finally, the GUI has been internationalized and translated to the Japanese language.
Enhanced Integration for Cloud. For OpenStack, XIV now offers a Nova-volume driver which provides persistent storage to OpenStack compute nodes. The Nova task force is now looking to move storage into its own project called Cinder. For VMware, XIV has full support for Site Recovery Manager (SRM) v4.1 and v5.0 releases. XIV now also supports the Microsoft System Center Virtual Machine Manager, which can manage Hyper-V, VMware and Citrix XenServer hypervisors.
Smaller entry point. The original XIV supported 1TB and 2TB drives, with the smallest offering being 27TB usable. When IBM introduced the XIV Gen3, the two choices were 2TB and 3TB disk drives. Unfortunately, this meant that the initial entry model was now 55TB in size, and each additional module would be more expensive as well. IBM is now going to offer 1TB support for XIV Gen3 for a lower price point, these are actually 2TB drives with half the capacity turned off.
Well, it's Tuesday again, and that means more IBM announcements!
Today, IBM announced the enhanced IBM System Storage DS3200 disk system.It is in our DS3000 series, the DS3200 is SAS-attach, DS3300 is iSCSI-attach, and DS3400 is FC-attach. All of them support up to 48 drives, which can be a mix of SAS and SATA drives.
The DS3200 supports the following operating environments (see IBM's [Interop Matrix] for details):
Linux (both Linux-x86 and Linux on POWER)
With today's announcements, the DS3200 can be used to boot from, as well as contain data. This is ideal to combine with IBM BladeCenter. With the IBM BladeCenter you can have 14 blades, either x86 or POWER based processors, attached to a DS3200 via SAS switch modules in the back of the chassis.
Let's take an example of how this can be used for a Scale-Out File Services[SoFS] deployment.
First, we start with servers. We can have either three [IBM System x3650] servers, but this would use up all six of the direct-attach ports. Instead, we'll choose the [BladeCenter H chassis], with three HS21 blades for SoFS, and that leaves us with eleven empty blade slots we could put in a management node, or other blades to run applications.
SAS connectivity modules
The IBM BladeCenter [SAS Connectivity Module] allows the blade servers to connect to a DS3200. Two of them fit right in the back of the BladeCenter chassis, providing full redundancy without consuming additional rack space.
DS3200 and EXP3000 expansion drawers
We'll have one DS3200 controller with twelve internal drives, and three expansion EXP3000 drawers with twelve drives each, for a total of 48 drives. Using 1TB SATA, this would be 48 TB raw capacity.
The end result? You get a 48TB NAS scalable storage solution, supporting up to 7500 concurrent CIFS and NFS users, with up to 700 MB/sec with large block transfers. By using BladeCenter, you can expand performance by adding more blades to the Chassis, or have some blades running SAP or Oracle RAC have direct read/write access to the SoFS data.
Just another example on how IBM can bring together all the components of a solution to provide customer value!
Intelligent block-level disk array that virtualizes both internal and external disk storage
8 Gbps FCP and 1GbE iSCSI
IBM Storwize V7000 disk system
Real-time compression appliance for files
10GbE/1GbE CIFS and NFS
Storwize, now an IBM company
IBM Real-time Compression STN-6800 appliance
1GbE CIFS and NFS
IBM Real-time Compression STN-6500 appliance
If you think this is the first time a company like IBM has pulled shenanigans with product names like this, think again. Here are a few posts that might refresh your memory:
In my September 2006 post, [A brand by any other name...] I explain that I started blogging specifically to promote the new "IBM System Storage" product line name, part of the "IBM Systems" brand resulting from merging the "eServer" and "TotalStorage' brands.
In my January 2007 post, [When Names Change], I explain our naming convention for our disk products, including our DS family, SAN Volume Controller and N series.
In my February 2008 post, [Getting Off the Island], I cover how the x/p/i/z designations came about for our various IBM server product lines.
But what about acquisitions? When [IBM acquired Lotus Development Corporation], it kept the "Lotus" brand. New products that fit the "collaboration" function were put under the Lotus brand. I think most people can accept this approach.
But have we ever seen an existing product renamed to an acquired name?
In my post January 2009 post
[Congratulations to Ken on your QCC Milestone], I mentioned that my colleague Ken Hannigan worked on an internal project initially called "Workstation Data Save Facility" (WDSF) which was changed to "Data Facility Distributed Storage Manager" (DFDSM), then renamed to "ADSTAR Distributed Storage Manager" (ADSM), and finally renamed to the name it has today: IBM Tivoli Storage Manager (TSM).
Readers reminded me that [IBM acquired Tivoli Systems, Inc.] in 1996, so TSM could not have been an internally developed product. Ha! Wrong! Let's take a quick history lesson on how this came about:
In the late 1980s, IBM Almaden research had developed a project to backup personal computers and workstations, which they called "Workstation Data Save Facility" or WDSF.
This was turned over to our development team, which immediately discarded the code, and wrote from scratch its replacmeent, called Data Facility Distributed Storage Manager (DFDSM), named similar to the Data Facility products on the mainframe (DFP, DFHSM, DFDSS). As a member of the Data Facility family, DFDSM didn't really fit. The rest processed mainframe data sets, but DFDSM processed Windows and UNIX files. That a version of DFDSM server was available to run on the mainframe was the only connection.
Then, in the early 1990s, there were discussions of possibly splitting IBM into a bunch of smaller "Baby Blues", similar to how [AT&T was split into "Baby Bells"], and how Forbes and Goldman Sachs now want to split Microsoft into [Baby Bills]. IBM considered naming the storage spin-off as ADSTAR, which stood for "Advanced Storage and Retrieval."
Pre-emptively, IBM renamed DFDSM to "ADSTAR Distributed Storage Manager" or ADSM.
Fortunately, in 1993, IBM brought a new sheriff to town, Lou Gerstner, who quickly squashed any plans to split up IBM. He quickly realized that IBM's core strength was building integrated stacks, combining systems, software and services to solve business problems.
In 1996, IBM acquired Tivoli Systems, Inc. to expand its "Systems Management" portfolio, and renamed ADSM over to IBM Tivoli Storage Manager, since "storage management" is an essential part of "systems management". Later, IBM TotalStorage Productivity Center would be renamed to "IBM Tivoli Storage Productivity Center."
I participated in five months of painful meetings to figure out what to name our new internally-developed midrange disk system. Since it ran SAN Volume Controller software, I pushed for keeping the SVC designation somehow. We considered DS naming convention, but the new midrange product would not fit between our existing DS5000 and DS6000 numbering scheme. A marketing agency we hired came up with nonsensical names, in the spirit of product names like Celerra, Centera and CLARiiON, using name generators like [Wordoid]. Luckily, in the nick of time, IBM acquired Storwize for its compression technology, and decided that Storwize as a name was way better fit than any of the names we came up with already.
However, the new IBM Storwize V7000 midrange product had nothing in common with the appliances acquired from Storwize, the company, so to avoid confusion, the latter products were renamed to [IBM Real-time Compression]. Fellow blogger Steven Kenniston, the Storage Alchemist from Storwize fame now part of IBM from the acquisition, gives his perspective on this in his post [Storwize – What is in a Name, Really?]. While I am often critical of the names and terms IBM uses, I have to say this last set of naming decisions makes a lot of sense to me and I support it wholeheartedly.
In preparation for my [upcoming trip to Australia and New Zealand], I decided to upgrade my smartphone. My service provider T-Mobile offered me the chance to try out any new phone for 14 days for only ten dollar re-stocking fee. For the past 16 months, I have used the Google G1 phone. This is based on a storage-optimized Android operating system, based on open source Linux, with applications processed in a storage-optimized virtual machine called Dalvik, based on open source Java. According to Wikipedia, Android-based phones have #1 market share [outselling both BlackBerry OS and Apple iOS phones]. There are over 70 different companies using Android, driven away from the proprietary interfaces from Apple, BlackBerry and Microsoft.
Since I was already familiar with the Android operating system, I chose the Samsung Galaxy S Vibrant. I liked my G1, but it had only a small amount of internal memory to store applications. The G1 supported an external Micro SDHC card, but this only was used for music and photos. There was no way to install applications on the memory card, so I found myself having to uninstall applications to make room for new ones. By contrast, the Vibrant has 16GB internal memory, plenty of room for all applications, and supports Micro SDHC up to 32GB in size. My model can pre-installed with a 2GB card, of which 1.4GB is consumed by James Cameron's full-length movie Avatar. On the G1, swapping out memory cards was relatively easy. On the Vibrant, you have to take the phone apart to swap out cards, so I won't be doing that very often. I will probably just get a 32GB card and leave it in there permanently.
(FTC disclosure: I work for IBM. IBM has working relationships with Oracle, Google, and lots of other companies. IBM offers its own commercial version of Java related tools. I own stock in IBM, Apple, Google. I have friends and family who work at Microsoft. My review below is based entirely on my own experience of my new Samsung Galaxy S Vibrant phone. Samsung has created different models for different service providers. The T-Mobile Vibrant is an external USB storage device with telephony capabilities, comparable to the AT&T Captivate, Verizon Fascinate, or Sprint Epic 4G. The majority of mobile phones in the world contain IBM technology. This post is not necessarily an endorsement for Samsung over other smartphone manufacturers, nor T-Mobile over other service providers. I provide this information in context of storage optimization, state-of-the-art for smartphones in general, and disputes related to software patents between companies. I hold 19 patents, most of which are software patents.)
When Oracle acquired Sun Microsystems, it inherited stewardship of Java. Java is offered in two flavors. Java Standard Edition (SE) for machines that are planted firmly on or below your desk, and Java Micro Edition (ME) for machines that are carried around. Most Java-based phones limit themselves to Java ME, but Google decided to base its smartphones on the more powerful Java SE, but then optimize for the limited storage and computing resources. These two levels of Java have radically different licensing terms and conditions, so Larry Ellison of Oracle cried foul. On The Register, Gavin Clarke has an excellent article with details of the Oracle-vs-Google complaint. Daniel Dilger opines that Oracle [might kill Google’s Android and software patents all at once]. Fellow blogger Mark Twomey (EMC) on his StorageZilla blog, argues that [it's not about Android phones, but Android everything].
My Vibrant is roughly the size of a half-inch stack of 3x5 index cards in my hand. In my humble opinion, the problem is the grey area between mobile phone and the desktop personal computer. Laptops, netbooks, iPads, tablet computers, eBook readers, and smartphones fall somewhere in between. At what point do you stop licensing Java SE and start licensing Java ME instead?
Let's take a look at all the stuff my new Samsung Vibrant can do, and let you decide for yourself. I have 140 applications installed, which I can access alphabetically. I also have up to seven screens which I can fill with application icons and widgets to simplify access. The screen measures about 4 inches diagonally. Click on each image below to see the full 480x800 resolution.
Each screen has five rows. On my first screen, I have the first two rows related to photography. This includes a camera, camcorder, bar-code scanner and visual search engine (Google Goggles). I am not happy with Flickr Droid app in uploading photos, so I might need to find another app for that. Other reviews I read complain that the Vibrant's camera does not have am LED flash for night time shots, and that there is no forward facing camera to do Skype or FaceTime-style videoconferencing. I think it is fine the way it is. An interesting feature of the camera app is that it uses the volume up/down buttons to zoom in and out.
The next two rows related to books and documents. In addition to both Amazon's Kindle and Barnes and Noble's Nook eBook readers, I have Dropbox to make it easy to transfer files between all my machines, a camera-scanner that generates PDFs, and ThinkFree, which appears to be based on OpenOffice open source software to create, view and edit WORD documents, EXCEL spreadsheets and PowerPoint presentations.
My second screen is for music and video entertainment.
The top row is consumed by a single widget for [Pandora], an internet radio station, not to be confused with the Pandora moon that the movie Avatar is based on. I-heart-radio, Slacker, and Last.fm are other internet radio stations. Be careful when roaming in another country, as the $15-per-MB transfer fees can really add up. While the Galaxy S has a built-in FM radio, T-Mobile has decided to disable this feature in its Vibrant model, in favor of internet-based radio stations.
I am glad the Samsung Vibrant uses the same 3.5mm combo audio jack that I mentioned in my blog post about my
[New ThinkPad T410]. This allows me to use the same headset for both my laptop and my cell phone.
For those who use Microsoft Windows Media Player v10 or above, this phone lets you transfer over your songs, playlists and videos via the USB cable in PMC mode. The TED application shows 18-minute videos of lectures at conferences that focus on Technology, Entertainment and Design. MobiTV offers live streaming of popular Television shows, normally ten dollars monthly, but I got a free 30-day trial in the deal.
Screen 3 is focused on travel. I have a 30-day free trial of GoGo, the new Wi-Fi networks on various airlines. Hopefully, I will get to try this out on my upcoming flights. When GoGo is not available, the Extended Controls widget allows me to turn the phone into "Airplane mode", which would allow me to read eBooks and listen to pre-recorded music and videos stored on my phone. Most of the apps on Android are free, but Extended Controls, shown here in the top row, cost me money but well worth it. With this you can customize different size widgets with all the appropriate setting toggles you want. On this one, I can toggle Wi-Fi, Data transfer, GPS positioning, and Airplane mode.
Google Maps, Google Places and Google Sky Map are all well represented here. I also like TripIt, which is a free Software-as-a-Service for managing your trip itenerary, and syncs up with their online website. Currency and Language translation can help on international travel. The standard Alarm Clock also includes Time Zone conversion as well.
My screen 4 is my central home page. There are four buttons on the bottom of the phone: Menu, Home, Back, and Search. Hit the "Home" button on any screen, and it jumps immediately to Screen 4. From here, I can get to any of the other screens with just swiping my finger across the surface. Therefore, I chose to keep this screen simple.
For meetings, I have a big clock, and an Extended Controls widget to set my phone on silent/vibrate mode, and show my battery status. I put icons here for apps that I might need in a hurry, like Camera, Evernote, or Shazam. For those not familiar with Shazam, it will listen to the microphone for whatever song is playing in the background where you are, and it will identify the song's title and artist.
The "Starred" folder lists those five or so contacts that I have marked with a "star" to be on this short list. From here, I can call or send them an SMS text message.
Screen 5 is for office productivity. I have a 2x2 widget from Astrid to list my to-do items. I have a 1x2 widget showing my last call. My calendar syncs up with my Google calendar online.
The Locale widget allows me to change which on-screen keyboard to use. There is the standard Android keyboard which allows voice-to-text input, the Samsung keyboard that offers [XT9 mode], and the new ["Swype"] keyboard that allows you to write words quickly with squiggles swiped across the keyboard. The Swype is incredible accurate when I am typing in English. When I am communicating in Spanish, it gets in the way, spell-checking when it shouldn't.
Screen 6 is for my social media, news and search facilities. I have HootSuite Lite for managing my Twitter and Facebook posts. For news junkies, NPR, USA Today and CNN all offer mobile versions.
I have a selection of browsers, including Opera Mini 5, and Dolphin Browser HD. The latter offers a variety of special add-ons similar to Firefox on a desktop system. I also have specialty search sites, including the Internet Movie Database (IMDB), Fandango for local movie times, and Dex for local phone listings.
Screen 7 is for system administration. The top row is another "Extended Controls" widget, this time to change between 2G and 3G networks, brightness setting, set the the time-out interval for when the screen should automatically shut off, and a "stay awake" to turn off the screen saver altogether.
I can do some really powerful things here. For example, I have an application to let me use secure shell (ssh) to access our systems at work. I also can "tether" my laptop to my Vibrant, for those few times when Wi-Fi is not available, to let my laptop use the phone's signal as a dial-up modem. It is slower than Wi-Fi, but might be just what I need in a pinch.
The bottom row is the same across all seven screens, which you can customize. I left the bottom row in its original default, with options to make phone calls, look up contacts, and send text messages. The bottom right corner launches a list of all applications alphabetically, to access those not on my seven main screens.
Just in case I switch to a local SIM card while abroad in another country, I asked T-mobile to unlock my phone, which they happily did at no additional charge. For example, while I am in Australia, I can either leave my T-Mobile USA chip in the phone, and pay roaming charges per minute, or I can purchase a SIM chip from a local phone company with pre-paid minutes. This often includes unlimited free incoming calls to a local Australian phone number, and voicemail.
Unlocking the phone to use different SIM cards is different than "jailbreaking", a term that refers to Apple's products. For Android phones, jailbreaking is called "rooting", as the process involves getting "root" user access that you normally don't have. The only reason I have found to have my phone "rooted" was to take these lovely screen shots, using the "Screen Shot It" application. This is another application that I paid for. I used the free trial for a few screenshots first to check it out, liked the results, and bought the application.
So, this new smartphone looks like a keeper. I got a screen protector to avoid scratching, and a two-piece case that snaps around the phone to give it more heft. All my chargers are "Mini USB" for my old G1 phone, and this new Vibrant phone is "Micro USB" instead, so I had to order new ones for my car, my office, and for my iGo (tip A97).
This review is more to focus on the fact that the IT industry is changing, and what was traditionally performed on personal computers are now being done on new handheld devices. Android provides a platform for innovation and healthy competition. Let's all hope Oracle and Google can work out their differences amicably.
Continuing my saga for my [New Laptop], I have gotten all my programs operational, transferred and organized all my data, and now ready for testing. You can read my previous posts on this series: [Day 1], [Day 2], [Day 3], [Day 4].
At this point, you might be thinking, "Testing? Just use your laptop already, deal with problems as you find them!" In my case, I need to sign off that the new laptop meets my needs, and then send back my previous laptop, wiped clean of all passwords and data. I have until the end of June to do this.
The value of testing is to avoid problems later, perhaps an inconvenient time such as a business trip or client briefing. It is better to work out any issues while I am still in the office, connected to the internal IBM intranet on a high-speed wired connection. Also, I plan to do a Physical-to-Virtual (P-to-V) conversion of my Windows XP C: drive to run as a virtual guest OS on Linux, so I want to make sure the image is in working order before the conversion. That said, here is what my testing encountered.
Of the 134 applications I had identified as being installed on my old laptop, I determined that I only needed about 70 of them. The others I did not bother to install on the new.
I had not thought about "addons" and "plugins" that I have that attach themselves inside browsers or other applications. I made sure that Flash, Shockwave and Java worked correctly on all three browsers: IE6, Firefox and Opera.
One of my "plugins" is an application called [iSpring Pro, which plugs into Microsoft PowerPoint. I thought I had Microsoft Office installed, but found out the standard IBM build had only the viewers. I installed Microsoft Office 2003 Standard Edition with PowerPoint, Excel and Word. I then realized that I did not have the original V4.3 installation file for iSpring Pro, so I downloaded the latest v5 from their website. However, my license key is only for version 4, so a quick email got this resolved, and the nice folks at iSpring Solutions sent me the v4.3 installation file.
Shameless Plug: We use iSpring Pro to record our voices with PowerPoint slides to generate web videos for the [IBM Virtual Briefing Center] which we use to complement face-to-face briefings. This allows attendees to review introductory materials to prepare for their visit to Tucson, or to stay up-to-date on products and features in between annual visits. If you have not checked out the IBM Virtual Briefing Center, now is a good time to see what videos and other resources we have out there. You can even request to schedule a briefing in Tucson!
Testing out iSpring Pro, I realized that there are no jacks for my headset. On my old ThinkPad T60, I had two jacks, one green for headphone and one pink for microphone. My headset has two cables, one for each, which I then use for the recordings. I also use this for online webinars and training sessions. Apparently, ThinkPad T410 went for a single 3.5mm "Combo" audio jack that handles both roles. Fortunately, there is a [Headset Buddy] adapter that merges the two cables from my headset to the combo jack on my new laptop. I ordered one which will arrive some time next week.
My new laptop doesn't fit my old docking station either. I had set the docking station aside while I had the two laptops latched together for the file transfers, but now that I am done with the old laptop, I discovered that my new T410 doesn't fit. I ordered a new one.
Using find, grep, awk, sort and uniq, I was able to generate a list of all the file extensions on my Documents foler. I was able to find old Lotus 123, Freelance Graphics, and Wordpro files. I thought Lotus Symphony would handle these, but it does not. I was able to install an old version of Lotus Smartsuite that includes these programs so that I can process these files.
I also found in the extensions list pptx, docx and xlsx files, which represent the new Microsoft Office 2007 formats. I installed the "Format Compatability Pack" that allows Office 2003 read these files.
Lastly, I installed a few programs that support a wide variety of file formats. VideoLAN's [VLC] plays a variety of audio and video files. [7-Zip] packs and unpacks a variety of archive files. (Note: Another program, BitZipper, also supports a variety of archive formats, but the install will corrupt your Firefox and IE browsers with new tool bars, change your search engine default, and install a lot of other unwanted software. Cleaning up the mess can be time-consuming. You have been warned!) I also installed [MadEdit], a binary/hex/text editor that will open any file to see what kind of format it has inside. From this, I was able to determine that some of my extension-less files were GIF, RTF or PDF format, and rename them accordingly.
With the testing done, I am ready to go wipe my old system of all passwords and data!
By combining multiple components into a single "integrated system", IBM can offer a blended disk-and-tape storage solutions. This provides the best of both worlds, high speed access using disk, while providing lower costs and more energy efficiency with tape. According to a study by the Clipper Group, tape can be 23 times less expensive than disk over a 5 year total cost of ownership (TCO).
I've also covered Hierarchical Storage Management, such as my post [Seven Tiers of Storage at ABN Amro], and my role as lead architect for DFSMS on z/OS in general, and DFSMShsm in particular.
However, some explanation might be warranted in the use of these two terms in regards to SONAS. In this case, ILM refers to policy-based file placement, movement and expiration on internal disk pools. This is actually a GPFS feature that has existed for some time, and was tested to work in this new configuration. Files can be individually placed on either SAS (15K RPM) or SATA (7200 RPM) drives. Policies can be written to move them from SAS to SATA based on size, age and days non-referenced.
HSM is also a form of ILM, in that it moves data from SONAS disk to external storage pools managed by IBM Tivoli Storage Manager. A small stub is left behind in the GPFS file system indicating the file has been "migrated". Any reference to read or update this file will cause the file to be "recalled" back from TSM to SONAS for processing. The external storage pools can be disk, tape or any other media supported by TSM. Some estimate that as much as 60 to 80 percent of files on NAS have low reference and should be stored on tape instead of disk, and now SONAS with HSM makes that possible.
This distinction allows the ILM movement to be done internally, within GPFS, and the HSM movement to be done externally, via TSM. Both ILM and HSM movement take advantage of the GPFS high-speed policy engine, which can process 10 million files per node, run in parallel across all interface nodes. Note that TSM is not required for ILM movement. In effect, SONAS brings the policy-based management features of DFSMS for z/OS mainframe to all the rest of the operating systems that access SONAS.
HTTP and NIS support
In addition to NFS v2, NFS v3, and CIFS, the SONAS v1.1.1 adds the HTTP protocol. Over time, IBM plans to add more protocols in subsequent releases. Let me know which protocols you are interested in, so I can pass that along to the architects designing future releases!
SONAS v1.1.1 also adds support for Network Information Service (NIS), a client/server based model for user administration. In SONAS, NIS is used for netgroup and ID mapping only. Authentication is done via Active Directory, LDAP or Samba PDC.
SONAS already had synchronous replication, which was limited in distance. Now, SONAS v1.1.1 provides asynchronous replication, using rsync, at the file level. This is done over Wide Area Network (WAN) across to any other SONAS at any distance.
Interface modules can now be configured with either 64GB or 128GB of cache. Storage now supports both 450GB and 600GB SAS (15K RPM) and both 1TB and 2TB SATA (7200 RPM) drives. However, at this time, an entire 60-drive drawer must be either all one type of SAS or all one type of SATA. I have been pushing the architects to allow each 10-pack RAID rank to be independently selectable. For now, a storage pod can have 240 drives, 60 drives of each type of disk, to provide four different tiers of storage. You can have up to 30 storage pods per SONAS, for a total of 7200 drives.
An alternative to internal drawers of disk is a new "Gateway" iRPQ that allows the two storage nodes of a SONAS storage pod to connect via Fibre Channel to one or two XIV disk systems. You cannot mix and match, a storage pod is either all internal disk, or all external XIV. A SONAS gateway combined with external XIV is referred to as a "Smart Business Storage Cloud" (SBSC), which can be configured off premises and managed by third-party personnel so your IT staff can focus on other things.
See the Announcement Letters for the SONAS [hardware] and [software] for more details.
For those who are wondering how this positions against IBM's other NAS solution, the IBM System Storage N series, the rule of thumb is simple. If your capacity needs can be satisfied with a single N series box per location, use that. If not, consider SONAS instead. For those with non-IBM NAS filers that realize now that SONAS is a better approach, IBM offers migration services.
Both the Information Archive and the SONAS can be accessed from z/OS or Linux on System z mainframe, from "IBM i", AIX and Linux on POWER systems, all x86-based operating systems that run on System x servers, as well as any non-IBM server that has a supported NAS client.
Did IBM XIV force EMC's hand to announce VMAXe? Let's take a stroll down memory lane.
In 2008, IBM XIV showed the world that it could ship a Tier-1, high-end, enterprise-class system using commodity parts. Technically, prior to its acquisition by IBM, the XIV team had boxes out in production since 2005. EMC incorrectly argued this announcement meant the death of the IBM DS8000. Just because EMC was unable to figure out how to have more than one high-end disk product, doesn't mean IBM or other storage vendors were equally challenged. Both IBM XIV and DS8000 are Tier-1, high-end, enterprise-class storage systems, as are the IBM N series N7900 and the IBM Scale-Out Network Attached Storage (SONAS).
In April 2009, EMC followed IBM's lead with their own V-Max system, based on Symmetrix Engenuity code, but on commodity x86 processors. Nobody at EMC suggested that the V-Max meant the death of their other Symmetrix box, the DMX-4, which means that EMC proved to themselves that a storage vendor could offer multiple high-end disk systems. Hitachi Data Systems (HDS) would later offer the VSP, which also includes some commodity hardware as well.
In July 2009, analysts at International Technology Group published their TCO findings that IBM XIV was 63 percent less expensive than EMC V-Max, in a whitepaper titled [COST/BENEFIT CASE
FOR IBM XIV STORAGE SYSTEM Comparing Costs for IBM XIV and EMC V-Max Systems]. Not surprisingly, EMC cried foul, feeling that EMC V-Max had not yet been successful in the field, it was too soon to compare newly minted EMC gear with a mature product like XIV that had been in production accounts for several years. Big companies like to wait for "Generation 1" of any new product to mature a bit before they purchase.
To compete against IBM XIV's very low TCO, EMC was forced to either deeply discount their Symmetrix, or counter-offer with lower-cost CLARiiON, their midrange disk offering. An ex-EMCer that now works for IBM on the XIV sales team put it in EMC terms -- "the IBM XIV provides a Symmetrix-like product at CLARiiON-like prices."
(Note: Somewhere in 2010, EMC dropped the hyphen, changing the name from V-Max to VMAX. I didn't see this formally announced anywhere, but it seems that the new spelling is the officially correct usage. A common marketing rule is that you should only rename failed products, so perhaps dropping the hyphen was EMC's way of preventing people from searching older reviews of the V-Max product.)
This month, IBM introduced the IBM XIV Gen3 model 114. The analysts at ITG updated their analysis, as there are now more customers that have either or both products, to provide a more thorough comparison. Their latest whitepaper, titled [Cost/Benefit Case for IBM XIV Systems: Comparing Cost
Structures for IBM XIV and EMC VMAX Systems], shows that IBM maintains its substantial cost savings advantage, representing 69 percent less Total Cost of Ownership (TCO) than EMC, on average, over the course of three years.
In response, EMC announced its new VMAXe, following the naming convention EMC established for VNX and VNXe. Customers cannot upgrade VNXe to VNX, nor VMAXe to VMAX, so at least EMC was consistent in that regard. Like the IBM XIV and XIV Gen3, the new EMC VMAXe eliminated "unnecessary distractions" like CKD volumes and FICON attachment needed for the IBM z/OS operating system on IBM System z mainframes. Fellow blogger Barry Burke from EMC explains everything about the VMAXe in his blog post [a big thing in a small package].
So, you have to wonder, did IBM XIV force EMC's hand into offering this new VMAXe storage unit? Surely, EMC sales reps will continue to lead with the more profitable DMX-4 or VMAX, and then only offer the VMAXe when the prospective customer mentions that the IBM XIV Gen3 is 69 percent less expensive. I haven't seen any list or street prices for the VMAXe yet, but I suspect it is less expensive than VMAX, on a dollar-per-GB basis, so that EMC will not have to discount it as much to compete against IBM.
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
This week, I am in beautiful Sao Paulo, Brazil, teaching Top Gun class to IBM Business Partners and sales reps. Traditionally, we have "Tape Thursday" where we focus on our tape systems, from tape drives, to physical and virtual tape libraries. IBM is the number #1 tape vendor, and has been for the past eight years.
(The alliteration doesn't translate well here in Brazil. The Portuguese word for tape is "fita", and Thursday here is "quinta-feira", but "fita-quinta-feira" just doesn't have the same ring to it.)
In the class, we discussed how to handle common misperceptions and myths about tape. Here are a few examples:
Myth 1: Tape processing is manually intensive
In my July 2007 blog post [Times a Million], I coined the phrase "Laptop Mentality" to describe the problem most people have dealing with data center decisions. Many folks extend linearly their experiences using their PCs, workstations or laptops to apply to the data center, unable to comprehend large numbers or solutions that take advantage of the economies of scale.
For many, the only experience dealing with tape was manual. In the 1980s, we made "mix tapes" on little cassettes, and in the 1990s we recorded our favorite television shows on VHS tapes in the VCR. Today, we have playlists on flash or disk-based music players, and record TV shows on disk-based video recorders like Tivo. The conclusion is that tapes are manual, and disk are not.
Manual processing of tapes ended in 1987, with the introduction of a silo-like tape library from StorageTek. IBM quickly responded with its own IBM 3495 Tape Library Data Server in 1992. Today, clients have many tape automation choices, from the smallest IBM TS2900 Tape Autoloader that has one drive and nine cartridges, all the way to the largest IBM TS3500 multiple-library shuttle complex that can hold exabytes of data. These tape automation systems eliminate most of the manual handling of cartridges in day-to-day operations.
Myth 2: Tape media is less reliable than disk media
For any storage media to be unreliable is to return the wrong information that is different than what was originally stored. There are only two ways for this to happen: if you write a "zero" but read back a "one", or write a "one" and read a "zero". This is called a bit error. Every storage media has a "bit error rate" that is the average likelihood for some large amount of data written.
According to the latest [LTO Bit Error rates, 2012 March], today's tape expects only 1 bit error per 10E17 bits written (about 100 Petabytes). This is 10 times more reliable than Enterprise SAS disk (1 bit per 10E16), and 100 times more reliable than Enterprise-class SATA disk (1 bit per 10E15).
Tape is the media used in "black boxes" for airplanes. When an airplane crashes, the black box is retrieved and used to investigate the causes of the crash. In 1986, the Space Shuttle Challenger exploded 73 seconds after take-off. The tapes in the black box sat on the ocean floor for six weeks before being recovered. Amazingly, IBM was able to successfully restore [90 percent of the block data, and 100 percent of voice data].
Analysts are quite upset when they are quoted out of context, but in this case, Gartner never said anything closely similar to this. Nor did the other analysts that Curtis investigated for similar claims. What Garnter did say was that disk provides an attractive alternative storage media for backup which can increase the performance of the recovery process.
Back in the 1990s, Savur Rao and I developed a patent to help backup DB2 for z/OS by using the FlashCopy feature of IBM's high-end disk system. The software method to coordinate the FlashCopy snapshots with the database application and maintain multiple versions was implemented in the DFSMShsm component of DFSMS. A few years later, this was part of a set of patents IBM cross-licensed to Microsoft for them to implement a similar software for Windows called Data Protection Manager (DPM). IBM has since introduced its own version for distributed systems called IBM Tivoli FlashCopy Manager that runs not just on Windows, but also AIX, Linux, HP-UX and Solaris operating systems.
Curtis suspects the "71 percent" citation may have been propogated by an ambitious product manager of Microsoft's Data Protection Manager, back in 2006, perhaps to help drive up business to their new disk-based backup product. Certainly, Microsoft was not the only vendor to disparage tape in this manner.
A few years ago, an [EMC failure brought down the State of Virginia] due to not just a component failure it its production disk system, but then made it worse by failing to recover from the disk-based remote mirror copy. Fortunately, the data was able to be restored from tape over the next four days. If you wonder why nobody at EMC says "Tape is Dead" anymore, perhaps it is because tape saved their butts that week.
(FTC Disclosure: I work for IBM and this post can be considered a paid, celebrity endorsement for all of the IBM tape and software products mentioned on this post. I own shares of stock in both IBM and Google, and use Google's Gmail for my personal email, as well as many other Google services. While IBM, Google and Microsoft can be considered competitors to each other in some areas, IBM has working relationships with both companies on various projects. References in this post to other companies like EMC are merely to provide illustrative examples only, based on publicly available information. IBM is part of the Linear Tape Open (LTO) consortium.)
Myth 4: Vendors and Manufacturers are no longer investing in tape technology
IBM and others are still investing Research and Development (R&D) dollars to improve tape technology. What people don't realize is that much of the R&D spent on magnetic media can be applied across both disk and tape, such as IBM's development of the Giant Magnetoresistance read/write head, or [GMR] for short.
Most recently, IBM made another major advancement with tape with the introduction of the Linear Tape File Systems (LTFS). This allows greater portability to share data between users, and between companies, but treating tape cartridges much like USB memory sticks or pen drives. You can read more in my post [IBM and Fox win an Emmy for LTFS technology]!
Next month, IBM celebrates the 60th anniversary for tape. It is good to see that tape continues to be a vibrant part of the IT industry, and to IBM's storage business!
I gotten several emails expressing worry that I have fallen off the face of th earth. The last two weeks have been educational and eye-opening for me. I can't provide details in my blog, so I will just say that it involved government agencies that IBM refers to as "dark accounts", and that I am now back safely in the USA. Between adjusting to time zone differences, ridiculously long hours, and restricted access to the internet, I was unable to blog lately.
Instead, I will resume my coverage of the [IBM System Storage Technical University 2011]. The "Solutions Expo" runs Monday evening through Wednesday lunch. This is a chance for people to explore all the solutions that are part of IBM's large "eco-system" for IBM System storage and System x products. There were several sponsors for this event.
As is often the case at these conferences, the various booths hand out fun items. The hot items this year were tie-dyed tee-shirts from Qlogic, and propeller beanies from the IBM rack and power systems team. Here is Amanda, one of the bartenders showing off the latter.
After the expo on Tuesday night, my friends at [Texas Memory Systems] held an after-party. Unlike the pens, tee-shirts and keychains at the Expo, these guys had a raffle for real storage products. Here is Erik Eyberg handing out a RamSan PCIe card, valued at $14,000 or so. IBM recently certified the TMS RamSan as External SSD storage for the IBM SAN Volume Controller (SVC). The SVC can optimize performance using this for automated sub-LUN tiering with the IBM System Storage Easy Tier feature.
This week I got a comment on my blog post [IBM Announces another SSD Disk offering!]. The exchange involved Solid State Disk storage inside the BladeCenter and System x server line. Sandeep offered his amazing performance results, but we have no way to get in contact with him. So, for those interested, I have posted on SlideShare.net a quick five-chart presentation on recent tests with various SSD offerings on the eX5 product line here:
Wrapping up this week's theme on the XO laptop, I decided to take on thechallenge of printing. I managed to print from my XO laptop to my laserjet printer.I checked the One Laptop Per Child [OLPC] website,and found there is no built-in support for printers, but there have been several peopleasking how to print from the XO, so here are the steps I did to make it happen.
(Note: I did all of these steps successfully on my Qemu-emulated system first, and then performed them on my XO laptop)
Step 1: Determine if you have an acceptable printer
The XO laptop can only connect to a printer via USB cable or over the network.Check your printer to see if it supports either of these two options. In my case, my printer is connected to my Linksys hub that offers Wi-Fi in my home.
The XO runs a modified version of Red Hat's Fedora 7, so we need to also determineif the printer is supported on Linux.Check the [Open Printing Database]for the level of support. This database has come up with the following ranking system.Printers are categorized according to how well they work under Linux and Unix. The ratings do not pertain to whether or not the printer will be auto-recognized or auto-configured, but merely to the highest level of functionality achieved.
Perfectly - everything the printer can do is working also under Linux
Mostly - work almost perfectly - funny enhanced resolution modes may be missing, or the color is a bit off, but nothing that would make the printouts not useful
Partially - mostly don't work; you may be able to print only in black and white on a color printer, or the printouts look horrible
Paperweight - These printers don't work at all. They may work in the future, but don't count on it
If your printer only supports a parallel cable connection, or does not have a high enough ranking above, go buy another printer. The [Linux Foundation] websiteoffers a list of suggested printers and tutorials.
In my case, I have a Brother HL5250-DN black-and-white laserjet printer connected over a network to Windows XP, OS X and my other Linux systems. It is rated as supporting Linux perfectly, so I decided to use this for my XO laptop.
Step 2: Install Common UNIX Printing System (CUPS)
Technically, Linux is not UNIX, but for our purposes, close enough. Start the Terminalactivity, use "su" to change to root, and then use "yum" to install CUPS. Yum will automatically determine what other packages are needed, in this case paps and tmpwatch. Once installed, use "/usr/sbin/cupsd" to get the CUPS daemon started, and add this to the end ofrc.local so that it gets started every time you reboot.
Click graphic on the left to see larger view
[olpc@xo-10-CC-6F ~]$ subash-3.2# yum install cups...Total download size = 3.0 MIs this OK [y/N]? y
To download the appropriate drivers, you may need a browser that can handle file downloads. I have triedto do this with the built-in Browse activity (aka Gecko) but encountered problems. I have both Opera and Firefox installed, but I will focus on Opera for this effort.I also installed the older22.214.171.124 version of the Flash player (worked better than the latest 126.96.36.199 version) and Java JRE.Follow the OLPC Wiki instructions for [Opera, Adobe Flash,and Sun Java] installation, thenverify with the following [Java and Flash] testers.
Step 4: Download drivers and packages unique for your printer
In my case, I used Opera to get to the [Brother Linux Driver Homepage], and downloaded the RPM's for LPR and CUPS wrapper. These are the ones listed under "Drivers for Red Hat, Mandrake (Mandriva), SuSE". I saved these under "/home/olpc" directory.
By default, the root user has no password. However, you will need it to be something for later steps,so here is the process to create a root password. I set mine to "tony" which normallywould be considered too simple a password, but ignore those messages and continue.We will remove it in step 8 (below) to put things back to normal.
[olpc@xo-10-CC-6F ~]$ subash-3.2# passwdChanging password for user root.New UNIX password: tonyBAD PASSWORD: it is too shortRetype new UNIX password: tonypasswd: all authentication tokens updated successfullybash-3.2# exit[olpc@xo-10-CC-6F ~]$
Step 6: Launch CUPS administration
Here I followed the instructions in Robert Spotswood's [Printing In Linux with CUPS] tutorial.Launch the Opera browser, and enter "http://localhost:631/admin" as the URL. The localhostrefers to the laptop itself, and 631 is the special port that CUPS listens to from browsers. You can alsouse 127.0.0.1 as a shortcut for "localhost", and can be used interchangeably.
In my case, it detected both of my networked printers, so I selected the HL5250DN, entered thelocation of my PPD file "/usr/share/cups/model/HL5250DN.ppd" that was created in Step 4. I set the URI to "lpd://192.168.0.75/binary_p1" per the instructions [Network Setting in CUPS based Linux system] in the Brother FAQ page. I chage the page size from "A4" to "Letter".I set this printer as the default printer. When it asks for userid and password, that is whereyou would enter "root" for the user, and "tony" or whatever you decided to set your root password to.
Select "Print a Test Page" to verify that everything is working.
Step 7: Printing actual files
Sadly, I don't know Opera well enough to know how to print from there. So, I went over to my trustedFirefox browser. Select File->Page Setup to specify the settings, File->Print Preview tosee what it will look like, and then File->Print to send it to the printer.
To print the file "out.txt" that is in your /home/olpc directory, for example, enter"file:///home/olpc/out.txt" as the URL of the firefox browser. This will show the file,which you can then print to your printer. I had to specify 200% scaling otherwise the fontswere too small to read.
Step 8: Remove the "root" password
If you want to remove the root password, here are the steps.
[olpc@xo-10-CC-6F ~]$ suPassword: tonybash-3.2# passwd -d rootRemoving password for user root.passwd: Successbash-3.2# exit[olpc@xo-10-CC-6F ~]$
Now the problem is that there is no way to print stuff from any of the Sugar activities. The best place toput in print support would be the Journal activity. Along the bottom where the mounted USB keys arelocated could be an icon for a printer, and dragging a file down to the printer ojbect could cause it tobe send to the printer.
The alternative is to write some scripts invocable from the Terminal activity to determine what isin the journal, and send them to LPR with the appropriate parameters.
I did not have time to do either of these, but perhaps someone out there can take on that as a project.
If we have learned anything from last decade's Y2K crisis, is that we should not wait for the last minute to take action. Now is the time to start thinking about weaning ourselves off Windows XP. IBM has 400,000 employees, so this is not a trivial matter.
Already, IBM has taken some bold steps:
Last July, IBM announced that it was switching from Internet Explorer (IE6) to [Mozilla Firefox as its standard browser]. IBM has been contributing to this open source project for years, including support for open standards, and to make it [more accessible to handicapped employees with visual and motor impairments]. I use Firefox already on Windows, Mac and Linux, so there was no learning curve for me. Before this announcement, if some web-based application did not work on Firefox, our Helpdesk told us to switch back to Internet Explorer. Those days are over. Now, if a web-based application doesn't work on Firefox, we either stop using it, or it gets fixed.
IBM also announced the latest [IBM Lotus Symphony 3] software, which replaces Microsoft Office for Powerpoint, Excel and Word applications. Symphony also works across Mac, Windows and Linux. It is based on the OpenOffice open source project, and handles open-standard document formats (ODF). Support for Microsoft Office 2003 will also run out in the year 2014, so moving off proprietary formats to open standards makes sense.
I am not going to wait for IBM to decide how to proceed next, so I am starting my own migrations. In my case, I need to do it twice, on my IBM-provided laptop as well as my personal PC at home.
Last summer, IBM sent me a new laptop, we get a new one every 3-4 years. It was pre-installed with Windows XP, but powerful enough to run a 64-bit operating system in the future. Here are my series of blog posts on that:
I decided to try out Red Hat Enterprise Linux 6.1 with its KVM-based Red Hat Enterprise Virtualization to run Windows XP as a guest OS. I will try to run as much as I can on native Linux, but will have Windows XP guest as a next option, and if that still doesn't work, reboot the system in native Windows XP mode.
So far, I am pleased that I can do nearly everything my job requires natively in Red Hat Linux, including accessing my Lotus Notes for email and databases, edit and present documents with Lotus Symphony, and so on. I have made RHEL 6.1 my default when I boot up. Setting up Windows XP under KVM was relatively simple, involving an 8-line shell script and 54-line XML file. Here is what I have encountered:
We use a wonderful tool called "iSpring Pro" which merges Powerpoint slides with voice recordings for each page into a Shockwave Flash video. I have not yet found a Linux equivalent for this yet.
To avoid having to duplicate files between systems, I use instead symbolic links. For example, my Lotus Notes local email repository sits on D: drive, but I can access it directly with a link from /home/tpearson/notes/data.
While my native Ubuntu and RHEL Linux can access my C:, D: and E: drives in native NTFS file system format, the irony is that my Windows XP guest OS under KVM cannot. This means moving something from NTFS over to Ext4, just so that I can access it from the Windows XP guest application.
For whatever reason, "Password Safe" did not run on the Windows XP guest. I launch it, but it takes forever to load and never brings up the GUI. Fortunately, there is a Linux version [MyPasswordSafe] that seems to work just fine to keep track of all my passwords.
Personal home PC
My Windows XP system at home gave up the ghost last month, so I bought a new system with Windows 7 Professional, quad-core Intel processor and 6GB of memory. There are [various editions of Windows 7], but I chose Windows 7 Professional to support running Windows XP as a guest image.
Here's is how I have configured my personal computer:
I actually found it more time-consuming to implement the "Virtual PC" feature of Windows 7 to get Windows XP mode working than KVM on Red Hat Linux. I am amazed how many of my Windows XP programs DO NOT RUN AT ALL natively on Windows 7. I now have native 64-bit versions of Lotus Notes and Symphony 3, which will do well enough for me for now.
I went ahead and put Red Hat Linux on my home system as well, but since I have Windows XP running as a guest under Windows 7, no need to duplicate KVM setup there. At least if I have problems with Windows 7, I can reboot in RHEL6 Linux at home and use that for Linux-native applications.
Hopefully, this will position me well in case IBM decides to either go with Windows 7 or Linux as the replacement OS for Windows XP.
Well, it's Tuesday, and you know what that means... IBM announcements!
In today's environment, clients expect more from their storage, and from their storage provider. The announcements span the gamut, from helping to use Business Analytics to analyze Big Data for trends, insights and patterns, to managing private, public and hybrid cloud environments, all with systems that are optimized for their particular workloads.
There are over a dozen different announcements, so I will split these up into separate posts. Here is part 1.
IBM Scale Out Network Attach Storage (SONAS) R1.3
I have covered [IBM SONAS] for quite some time now. Based on IBM's General Parallel File System (GPFS), this integrated system combines servers, storage and software into a fully functional scale-out NAS solution that support NFS, CIFS, FTP/SFTP, HTTP/HTTPS, and SCP protocols. IBM continues its technical leadership in the scale-out NAS marketplace with new hardware and software features.
The hardware adds new disk options, with 900GB SAS 15K RPM drives, and 3TB NL-SAS 7200 RPM drives. These come in 4U drawers of 60 drives each, six ranks of ten drives each. So, with the high-performance SAS drives that would be about 43TB usable capacity per drawer, and with the high-capacity NL-SAS drives about 144TB usable. You can have any mix of high-performance drawers and high-capacity drawers, up to 7200 drives, for a maximum usable capacity of 17PB usable (21PB for those who prefer it raw). This makes it the largest commercial scale-out NAS in the industry. This capacity can be made into one big file system, or divided up to 256 smaller file systems.
In addition to snapshots of each file system, you can divide the file system up into smaller tree branches and snapshot these independently as well. The tree branches are called fileset containers. Furthermore, you can now make writeable clones of individual files, which provides a space-efficient way to create copies for testing, training or whatever.
Performance is improved in many areas. The interface nodes now can support a second dual-port 10GbE, and replication performance is improved by 10x.
SONAS supports access-based enumeration, which means that if there are 100 different subdirectories, but you only have authority to access five of them, then that's all you see, those five directories. You don't even know the other 95 directories exist.
I saved the coolest feature for last, it is called Active Cloud Engine™ that offers both local and global file management. Locally, Active Cloud Engine placement rules to decide what type of disk a new file should be placed on. Management rules that will move the files from one disk type to another, or even migrates the data to tape or other externally-managed storage! A high-speed scan engine can rip through 10 million files per node, to identify files that need to be moved, backed up or expired.
Globally, Active Cloud Engine makes the global namespace truly global, allowing the file system to span multiple geographic locations. Built-in intelligence moves individual files to where they are closest to the users that use them most. This includes an intelligent push-over-WAN write cache, on-demand pull-from-WAN cache for reads, and will even pre-fetch subsets of files.
No other scale-out NAS solution from any other storage vendor offers this amazing and awesome capability!
IBM® Storwize® V7000
Last year, we introduced the [IBM Storwize V7000], a midrange disk system with block-level access via FCP and iSCSI protocols. The 2U-high control enclosure held two cannister nodes, a 12-drive or 24-drive bay, and a pair of power-supply/battery UPS modules. The controller could attach up to nine expansion enclosures for more capacity, as well as virtualize other storage systems. This has been one of our most successful products ever, selling over 100PB in the past 12 months to over 2,500 delighted customers.
The 12-drive enclosure now supports both 2TB and 3TB NL-SAS drives. The 24-drive enclosures support 200/300/400GB Solid-State Drives (SSD), 146 and 300GB 15K RPM drives, 300/450/600GB 10K RPM drives, and a new 1TB NL-SAS drive option. For those who want to set up "Flash-and-Stash" in a single 2U drawer, now you can combine SSD and NL-SAS in the 24-drive enclosure! This is the perfect platform for IBM's Easy Tier sub-LUN automated tiering. IBM's Easy Tier is substantially more powerful and easier to use than EMC's FAST-VP or HDS's Dynamic Tiering.
Last week, at Oracle OpenWorld, there were various vendors hawking their DRAM/SSD-only disk systems, including my friends at Texas Memory Systems, Pure Storage, and Violin Memory Systems. When people came to the IBM booth to ask what IBM offers, I explained that both the IBM DS8000 and the Storwize V7000 can be outfitted in this manner. With the Storwize V7000, you can buy as much or little SSD as you like. You do not have to buy these drives in groups of 8 or 16 at a time.
The Storwize V7000 is the sister product of the IBM SAN Volume Controller, so you can replicate between one and the other. I see two use cases for this. First, you might have a SVC at a primary location, and decide to replicate just the subset of mission-critical production data to a remote location, and use the Storwize V7000 as the target device. Secondly, you could have three remote or branch offices (ROBO) that replicate to a centralized data center SAN Volume Controller.
Lastly, like the SVC, the Storwize V7000 now supports clustering so that you can now combine multiple control enclosures together to make a single system.
IBM® Storwize® V7000 Unified
Do you remember how IBM combined the best of SAN Volume Controller, XIV and DS8000 RAID into the Storwize V7000? Well, IBM did it again, combining the best of the Storwize V7000 with the common NAS software base developed for SONAS into the new "Storwize V7000 Unified".
You can upgrade your block-only Storwize V7000 into a file-and-block "Storwize V7000 Unified" storage system. This is a 6U-high system, consisting of a pair of 2U-high file modules connected to a standard 2U-high control enclosure. Like the block-only version, the control enclosure can attach up to nine expansion enclosures, as well as all the same support to virtualize external disk systems. The file modules combine the management node, interface node and storage node functionality that SONAS R1.3 offers.
What exactly does that mean for you? In addition to FCP and iSCSI for block-level LUNs, you can carve out file systems that support NFS, CIFS, FTP/SFTP, HTTP/HTTPS, and SCP protocols. All the same support as SONAS for anti-virus checking, access-based enumeration, integrated TSM backup and HSM functionality to migrate data to tape, NDMP backup support for other backup software, and Active Cloud Engine's local file management are all included!
IBM SAN Volume Controller V6.3
The SAN Volume Controller [SVC] increases its stretched cluster to distances up to 300km. This is 3x further than EMC's VPLEX offering. This allows identical copies of data to be kept identical in both locations, and allows for Live Partition Mobility or VMware vMotion to move workloads seamlessly from one data center to another. Combining two data centers with an SVC stretch cluster is often referred to as "Data Center Federation".
The SVC also introduces a low-bandwidth option for Global Mirror. We actually borrowed this concept from our XIV disk system. Normally, SVC's Global Mirror will consume all the bandwidth it can to keep the destination copy of the data within a few seconds of currency behind the source copy. But do you always need to be that current? Can you afford the bandwidth requirements needed to keep up with that? If you answered "No!" to either of these, then the low-bandwidth option is you. Basically, a FlashCopy is done on the source copy, this copy is then sent over to the destination, and a FlashCopy is made of that. The process is then repeated on a scheduled basis, like every four hours. This greatly reduces the amount of bandwidth required, and for many workloads, having currency in hours, rather than seconds, is good enough.
I am very excited about all these announcements! It is a good time to be working for IBM, and look forward to sharing these exciting enhancements with clients at the Tucson EBC.
Well, it's Tuesday again, and you know what that means! IBM Announcements!
Today, IBM announced its latest IBM Tivoli Key Lifecycle Manager (TKLM) 2.0 version. Here's a quick recap:
Centralized Key Management
Centralized and simplified encryption key management through Tivoli Key Lifecycle Manager's lifecycle of creation, storage, rotation, and protection of encryption keys and key serving through industry standards. TKLM is available to manage the encryption keys for LTO-4, LTO-5, TS1120 and TS1130 tape drives enabled for encryption, as well as DS8000 and DS5000 disk systems using Full Disk Encryption (FDE) disk drives.
Partitioning of Access Control for Multitenancy
Access control and partitioning of the key serving functions, including end-to-end authentication of encryption clients and security of exchange of encryption keys, such that groups of devices have different sets of encryption keys with different administrators. This enables [multitenancy] or multilayer security of a shared infrastructure using encryption as an enforcement mechanism for access control. As Information Technology shifts from on-premises to the cloud, multitenancy will become growingly more important.
Support for KMIP 1.0 Standard
Support for the new key management standard, Key Management Interoperability Protocol (KMIP), released through the Organization for the Advancement of Structured Information Standards [OASIS]. This new standard enables encryption key management for a wide variety of devices and endpoints. See the
[22-page KMIP whitepaper] for more information.
As much as I like to poke fun at Oracle, with hundreds of their Sun/StorageTek clients switching over to IBM tape solutions every quarter, I have to give them kudos for working cooperatively with IBM to come up with this KMIP standard that we can both support.
Support for non-IBM devices from Emulex, Brocade and LSI
Support for IBM self-encrypting storage offerings as well as suppliers of IT components which support KMIP, including a number of supported non-IBM devices announced by business partners such as Emulex, Brocade, and LSI. KMIP support permits you to deploy Tivoli Key Lifecycle Manager without having to worry about being locked into a proprietary key management solution. If you are a client with multiple "Encryption Key Management" software packages, now is a good time to consolidate onto IBM TKLM.
Role-based access control for administrators that allows multiple administrators with different roles and permissions to be defined, helping increase the security of sensitive key management operations and better separation of duties. For example, that new-hire college kid might get a read-only authorization level, so that he can generate reports, and pack the right tapes into cardboard boxes. Meanwhile, for that storage admin who has been running the tape operations for the past ten years, she might get full access. The advantage of role-based authorization is that for large organizations, you can assign people to their appropriate roles, and you can designate primary and secondary roles in case one has to provide backup while the other is out of town, for example.
Well it's Tuesday again, and you know what that means... IBM announcements! Yesterday, at the IBM Edge conference here in Orlando, Florida, IBM announced its new apporach to storage, and a whole bunch of storage products, enhancements, and services. I will focus on some key ones here, and save the rest for next week.
IBM SAN Volume Controller (SVC) v6.4
The SVC is IBM's enterprise-class storage hypervisor. The latest software release, v6.4, can be installed on any SVC hardware, from the 2145-8F2 introduced back in 2005, to newer models like the 2145-CG8. Here are the key features:
Fibre Channel over Ethernet (FCoE) -- This is complete end-to-end support. For SVC units with 10GbE ports, these ports can be now be used for FCoE. This allows hosts to attach to SVC via FCoE, allows SVC node-to-node communication for clustering, and allows SVC to communicate to back-end devices via FCoE.
Real-Time Compression -- IBM ported over the patent Random Access Compression Engine (RACE) from the Real-Time Compression Appliances to SVC v6.4. This allows primary data, accessed via block-based protocols, to be compressed up to 80 percent. This feature is an extra priced feature by TB.
Non-Disruptive Volume move between I/O Groups -- If you don't already have SVC, you don't need to worry about this. For existing SVC customers, this allows volumes to be associated with two or more I/O groups, and that you can add or remove I/O groups non-disruptively. For example, if you want to move a volume from IOG1 to IOG2, then you add IOG2 to the list of I/O groups for the volume, let the multi-pathing software discover the additional paths, the remove IOG1, which then marks the previous IOG1 paths inactive. All this can be done while applications read and write data.
Dedicate FCP ports for Replication -- If you activate the two 10GbE Ethernet ports for FCoE, you can free up two FCP ports that you can dedicate for long-distance Metro Mirror or Global Mirror.
If you have SVC today, but are running an old release like v4.3 or v5.1, I recommennd you upgrade up to at least v6.2.05 release now. This release has been out for a year and is very stable, and serves as a great platform for a later upgrade to SVC v6.4.
IBM Storwize V7000 v6.4
The Storwize V7000 is IBM's midrange storage hypervisor. The latest software release, v6.4, can be installed on existing block-only Storwize V7000 units in the field. The Storwize V7000 v6.4 gets all the features listed above, as well as the following:
Four-way clustering -- Previously, you could cluster two Storwize V7000 controller enclosures together (4 canisters total). To cluster three or four controllers required an RPQ. Now, IBM supports up to four Storwize V7000 controller enclosures (8 canisters) without an RPQ.
Direct Fibre Channel attach -- A lot of people are using Storwize V7000 inside single-rack configurations, so it makes sense not to require a SAN switch for just a few Windows, Linux or VMware servers. An RPQ is now available to allow this to happen.
IBM Tivoli Storage Productivity Center (TPC) v5.1
TPC is already ranked one of the best Storage Infrastructure Management software in the market, and this release will just solidify its lead. Key features include:
Upward integration to higher level management systems
A new, intuitive, easy-to-use web-based GUI inspired by the XIV GUI
Integration of COGNOS to be able to generate and customize reports
Support for SONAS systems
There are several presentations on TPC this week that will go into more detail. Check out the [TPC Facebook page].
My latest book Inside System Storage: Volume IV is now available!
Yes, can you believe it? I have published my fourth volume in my "Inside System Storage" series! It is available in three formats:
Hardcover with dust jacket
eBook (Adobe Acrobat PDF)
You can order this, and all my other books, in all formats, directly from my [Author Spotlight] page. The paperback will also be available soon from other online booksellers, search for ISBN 978-1-105-72213-4.
IBM DS3500 Express
The DS3500 is our entry-level block-based device, designed specifically for random I/O workloads. This includes databases, email repositories, traditional business applications, and on-line transactional workloads. Here are the new features:
Dynamic Disk Pooling, similar to what XIV does to reduce disk rebuild times, but using a RAID-6 like approach per chunk of data.
Thin Provisioning using Dynamic Disk Pooling
Asynchronous Logical Unit Access (ALUA) failover
Enhanced FlashCopy, improved scalability, consistency groups and rollback support
VMware API for Array Integration (VAAI) support. This includes Write Same, Extended Copy, and Atomic Test & Set.
The DS3500 replaces the previous models of DS3200, DS3300 and DS3400 models.
The DCS3700 is our entry-level/midrange block-based device, replacing the DCS9900 model, designed specifically for sequential I/O workloads. This includes Big Data analytics, Hadoop, High Performance Computing (HPC), video surveillance, and television broadcasting. It holds 60 drives in a 4U controller enclosure.
My colleagues, Harley Puckett (left) and Jack Arnold (right) were highlighted in today's Arizona Daily Star, our local newspaper, as part of an article on IBM's success and leadership in the IT storage industry. At 1400 employees here in Tucson, IBM is Southern Arizona's 36th largest employer.
Highlighted in the article:
DS8700 with the new Easy Tier feature
TS7650 ProtecTIER virtual tape library with data deduplication capability
LTO-5 tape and the new Long Term File System (LTFS)
XIV with the new 2TB drive, for a maximum per-rack usable capacity of 161 TB.
In his last post in this series, he mentions that the amazingly successful IBM SAN Volume Controller was part of a set of projects:
"IBM was looking for "new horizon" projects to fund at the time, and three such projects were proposed and created the "Storage Software Group". Those three projects became know externally as TPC, (TotalStorage Productivity Center), SanFS (SAN File System - oh how this was just 5 years too early) and SVC (SAN Volume Controller). The fact that two out of the three of them still exist today is actually pretty good. All of these products came out of research, and its a sad state of affairs when research teams are measured against the percentage of the projects they work on, versus those that turn into revenue generating streams."
But this raises the question: Was SAN File System just five years too early?
IBM classifies products into three "horizons"; Horizon-1 for well-established mature products, Horizon-2 was for recently launched products, and Horizon-3 was for emerging business opportunities (EBO). Since I had some involvement with these other projects, I thought I would help fill out some of this history from my perspective.
Back in 2000, IBM executive [Linda Sanford] was in charge of IBM storage business and presented that IBM Research was working on the concept of "Storage Tank" which would hold Petabytes of data accessible to mainframes and distributed servers.
In 2001, I was the lead architect of DFSMS for the IBM z/OS operating system for mainframes, and was asked to be lead architect for the new "Horizon 3" project to be called IBM TotalStorage Productivity Center (TPC), which has since been renamed to IBM Tivoli Storage Productivity Center.
In 2002, I was asked to lead a team to port the "SANfs client" for SAN File System from Linux-x86 over to Linux on System z. How easy or difficult to port any code depends on how well it was written with the intent to be ported, and porting the "proof-of-concept" level code proved a bit too challenging for my team of relative new-hires. Once code written by research scientists is sufficiently complete to demonstrate proof of concept, it should be entirely discarded and written from scratch by professional software engineers that follow proper development and documentation procedures. We reminded management of this, and they decided not to make the necessary investment to add Linux on System z as a supported operating system for SAN file system.
In 2003, IBM launched Productivity Center, SAN File System and SAN Volume Controller. These would be lumped together with Horizon-1 product IBM Tivoli Storage Manager and the four products were promoted together as the inappropriately-named [TotalStorage Open Software Family]. We actually had long meetings debating whether SAN Volume Controller was hardware or software. While it is true that most of the features and functions of SAN Volume Controller is driven by its software, it was never packaged as a software-only offering.
The SAN File System was the productized version of the "Storage Tank" research project. While the SAN Volume Controller used industry standard Fibre Channel Protocol (FCP) to allow support of a variety of operating system clients, the SAN File System required an installed "client" that was only available initially on AIX and Linux-x86. In keeping with the "open" concept, an "open source reference client" was made available so that the folks at Hewlett-Packard, Sun Microsystems and Microsoft could port this over to their respective HP-UX, Solaris and Windows operating systems. Not surprisingly, none were willing to voluntarily add yet another file system to their testing efforts.
Barry argues that SANfs was five years ahead of its time. SAN File System tried to bring policy-based management for information, which has been part of DFSMS for z/OS since the 1980s, over to distributed operating systems. The problem is that mainframe people who understand and appreciate the benefits of policy-based management already had it, and non-mainframe couldn't understand the benefits of something they have managed to survive without.
(Every time I see VMware presented as a new or clever idea, I have to remind people that this x86-based hypervisor basically implements the mainframe concept of server virtualization introduced by IBM in the 1970s. IBM is the leading reseller of VMware, and supports other server virtualization solutions including Linux KVM, Xen, Hyper-V and PowerVM.)
To address the various concerns about SAN File System, the proof-of-concept code from IBM Research was withdrawn from marketing, and new fresh code implementing these concepts were integrated into IBM's existing General Parallel File System (GPFS). This software would then be packaged with a server hardware cluster, exporting global file spaces with broad operating system reach. Initially offered as IBM Scale-out File Services (SoFS) service offering, this was later re-packaged as an appliance, the IBM Scale-Out Network Attached Storage (SONAS) product, and as IBM Smart Business Storage Cloud (SBSC) cloud storage offering. These now offer clustered NAS storage using the industry standard NFS and CIFS clients that nearly all operating systems already have.
Today, these former Horizon-1 products are now Horizon-2 and Horizon-3. They have evolved. Tivoli Storage Productivity Center, GPFS and SAN Volume Controller are all market leaders in their respective areas.
Continuing coverage of my week in Washington DC for the annual [2010 System Storage Technical University], I attended several XIV sessions throughout the week. There were many XIV sessions. I could not attend all of them. Jack Arnold, one of my colleagues at the IBM Tucson Executive Briefing Center, often presents XIV to clients and Business Partners. He covered all the basics of XIV architecture, configuration, and features like snapshots and migration. Carlos Lizarralde presented "Solving VMware Challenges with XIV". Ola Mayer presented "XIV Active Data Migration and Disaster Recovery".
Here is my quick recap of two in particular that I attended:
XIV Client Success Stories - Randy Arseneau
Randy reported that IBM had its best quarter ever for the XIV, reflecting an unexpected surge shortly after my blog post debunking the DDF myth last April. He presented successful case studies of client deployments. Many followed a familiar pattern. First, the client would only purchase one or two XIV units. Second, the client would beat the crap out of them, putting all kinds of stress from different workloads. Third, the client would discover that the XIV is really as amazing as IBM and IBM Business Partners have told them. Finally, in the fourth phase, the client would deploy the XIV for mission-critical production applications.
A large US bank holding company managed to get 5.3 GB/sec from a pair of XIV boxes for their analytics environment. They now have 14 XIV boxes deployed in mission-critical applications.
A large equipment manufacturer compared the offerings among seven different storage vendors, and IBM XIV came out the winner. They now have 11 XIV boxes in production and another four boxes for development/test. They have moved their entire VMware infrastructure to IBM XIV, running over 12,000 guest instances.
A financial services company bought their first XIV in early 2009 and now has 34 XIV units in production attached to a variety of Windows, Solaris, AIX, Linux servers and VMware hosts. Their entire Microsoft Exchange was moved from HP and EMC disk to IBM XIV, and experienced noticeable performance improvement.
When a University health system replaced two competitive disk systems with XIV, their data center temperature dropped from 74 to 68 degrees Fahrenheit. In general, XIV systems are 20 to 30 percent more energy efficient per usable TB than traditional disk systems.
A service provider that had used EMC disk systems for over 10 years evaluated the IBM XIV versus upgrading to EMC V-Max. The three year total cost of ownership (TCO) of EMC's V-Max was $7 Million US dollars higher, so EMC counter-proposed CLARiiON CX4 instead. But, in the end, IBM XIV proved to be the better fit, and now the customer is happy having made the switch.
The manager of an information communications technology service provider was impressed that the XIV was up and running in just a couple of days. They now have over two dozen XIV systems.
Another XIV client had lost all of their Computer Room Air Conditioning (CRAC) units for several hours. The data center heated up to 126 degrees Fahrenheit, but the customer did not lose any data on either of their two XIV boxes, which continued to run in these extreme conditions.
Optimizing XIV Performance - Brian Cormody
This session was an update from the [one presented last year] by Izhar Sharon. Brian presented various best practices for optimizing the performance when using specific application workloads with IBM XIV disk systems.
Oracle ASM: Many people allocate lots of small LUNs, because this made sense a long time ago when all you had was just a bunch of disks (JBOD). In fact, many of the practices that DBAs use to configure databases across disks become unnecessary with XIV. Wth XIV, you are better off allocating a few number of very large LUNs from the XIV. The best option was a 1-volume ASM pool with 8MB AU stripe. A single LUN can contain multiple Oracle databases. A single LUN can be used to store all of the logs.
VMware: Over 70 percent of XIV customers use it with VMware. For VMFS, IBM recommends allocating a few number of large LUNs. You can specify the maximum of 2181 GB. Do not use VMware's internal LUN extension capability, as IBM XIV already has thin provisioning and works better to allow XIV to do this for you. XIV Snapshots provide crash-consistent copies without all the VMware overhead of VMware Snapshots.
SAP: For planning purposes, the "SAPS" unit equates roughly to 0.4 IOPS for ERP OLTP workloads, and 0.6 IOPS for BW/BI OLAP workloads. In general, an XIV can deliver 25-30,000 IOPS at 10-15 msec response time, and 60,000 IOPS at 30 msec response time. With SAP, our clients have managed to get 60,000 IOPS at less than 15 msec.
Microsoft Exchange: Even my friends in Redmond could not believe how awesome XIV was during ESRP testing. Five Exchange 2010 servers connected two a pair of XIV boxes using the new 2TB drawers managed 40,000 mailboxes at the high profile (0.15 IOPS per mailbox). Another client found four XIV boxes (720 drives) was able to handle 60,000 mailboxes (5GB max), which would have taken over 4000 drives if internal disk drives were used instead. Who said SANs are obsolete for MS Exchange?
Asynchronous Replication: IBM now has an "Async Calculator" to model and help design an XIV async replication solution. In general, dark fiber works best, and MPLS clouds had the worst results. The latest 10.2.2 microcode for the IBM XIV can now handle 10 Mbps at less than 250 msec roundtrip. During the initial sync between locations, IBM recommends setting the "schedule=never" to consume as much bandwidth as possible. If you don't trust the bandwidth measurements your telco provider is reporting, consider testing the bandwidth yourself with [iPerf] open source tool.
Every year, I teach hundreds of sellers how to sell IBM storage products. I have been doing this since the late 1990s, and it is one task that has carried forward from one job to another as I transitioned through various roles from development, to marketing, to consulting.
This week, I am in the city of Taipei [Taipei] to teach Top Gun sales class, part of IBM's [Sales Training] curriculum. This is only my second time here on the island of Taiwan.
As you can see from this photo, Taipei is a large city with just row after row of buildings. The metropolitan area has about seven million people, and I saw lots of construction for more on my ride in from the airport.
The student body consists of IBM Business Partners and field sales reps eager to learn how to become better sellers. Typically, some of the students might have just been hired on, just finished IBM Sales School, a few might have transferred from selling other product lines, while others are established storage sellers looking for a refresher on the latest solutions and technologies.
I am part of the teach team comprised of seven instructors from different countries. Here is what the week entails for me:
Monday - I will present "Selling Scale-Out NAS Solutions" that covers the IBM SONAS appliance and gateway configurations, and be part of a panel discussion on Disk with several other experts.
Tuesday - I have two topics, "Selling Disk Virtualization Solutions" and "Selling Unified Storage Solutions", which cover the IBM SAN Volume Controller (SVC), Storwize V7000 and Storwize V7000 Unified products.
Wednesday - I will explain how to position and sell IBM products against the competition.
Thursday - I will present "Selling Infrastructure Management Solutions" and "Selling Unified Recovery Management Solutions", which focus on the IBM Tivoli Storage portfolio, including Tivoli Storage Productivity Center, Tivoli Storage Manager (TSM), and Tivoli Storage FlashCopy Manager (FCM). The day ends with the dreaded "Final Exam".
Friday - The students will present their "Team Value Workshop" presentations, and the class concludes with a formal graduation ceremony for the subset of students who pass. A few outstanding students will be honored with "Top Gun" status.
These are the solution areas I present most often as a consultant at the IBM Executive Briefing Center in Tucson, so I can provide real-life stories of different client situations to help illustrate my examples.
The weather here in Taipei calls for rain every day! I was able to take this photo on Sunday morning while it was still nice and clear, but later in the afternoon, we had quite the downpour. I am glad I brought my raincoat!
Back in Februray, my blog post [A Box Full of Floppies] mentioned that I uncovered some diskettes compressed with OS/2 Stacker. Jokingly, I suggested that I may have to stand up an OS/2 machine just to check out what is actually on those floppies. Each floppy contains only three files: README.STC, STACKER.EXE and a hidden STACKVOL.DSK file. The README.STC explains that the disk is compressed by Stacker, a program developed by [Stac Electronics, Inc.]. The STACKER.EXE would not run on Windows XP, Vista or Windows 7. The STACKVOL.DSK is just a huge binary file, like a ZIP file, compressed with [Lempel-Ziv-Stac] algorithm that combines Lempel-Ziv with Huffman coding.
In my follow-up post [Like Sands in an Hourglass], I explained how there are many ways I could have tackled this project. I could either use the Emulation approach and try to build an OS/2 guest image under a hypervisor like VMware, KVM or VirtualBox, or just take the Museum approach and try taking one of my half dozen old machines, wipe it clean and stand up OS/2 on it bare metal. This turned out to be more challenging than I expected. The systems I have that are modern and powerful enough to run hypervisors don't have floppy drives, so I opted for the Museum approach.
(A quick [history of OS/2] might be helpful. IBM and Microsoft jointly developed OS/2 back in 1985. By 1990, Microsoft decided it's own Windows operating system was more popular with the ladies, and decided to break off with IBM. In 1992, IBM release OS/2 version 2.0, touted as "a better DOS than DOS and a better Windows than Windows!" Both parties maintained ownership rights, Microsoft renamed OS/2 to Windows NT. The "NT" stood for New Technology, the basis for all of the enterprise-class Windows servers used today. IBM named its version of OS/2 version 3 and 4 "WARP", with the last version 4.52 released in 2001. In its heyday, OS/2 ran the majority of Automated Teller Machines (ATMs), was used for hardware management consoles (HMC), and was used worldwide to run various Railway systems. After 2001, IBM encouraged people to transition from Windows or OS/2 over to Java and Linux. For those that can't or won't leave OS/2, IBM partnered with Serenity Systems to continue OS/2 under the brand [eComStation].)
Working with an IBM [ThinkCentre 8195-E2U Pentium 4 machine] with 640MB RAM and 80GB hard disk, a CD-rom and one 3.5-inch floppy drive, I first discovered that OS/2 is limited to very small amounts of hard disk. There are limits on [file systems and partition sizes] as well as the infamous [1024-cylinder limit] for bootable operating systems. Having a completely empty drive didn't work, as the size of the disk was too big. Carving out a big partition out of this also failed, as it exceeded the various limits. Each time, it felt the partition table was corrupted because the values were so huge. Even modern Disk Partitioning tools ([SysRescueCD] or [PartedMagic]) didn't work, as these create partitions not recognizable to OS/2.
The next obstacle I knew I would encounter would be device drivers. OS/2 comes as a set of three floppy diskettes and a CD-rom. The bootable installation disk was referred to affectionately as "Disk 0", then Disk 1, then Disk 2. Once all drivers have been loaded into memory, then it can start looking at the CDrom, and continue with the installation. In searching for updated drivers, I came across [Updated OS/2 Warp 4 Installation Diskettes] to address problems with newer display monitors. It also addresses the 8.4GB volume limit.
The updates were in the form of EXE files that only execute in a running DOS or OS/2 environment, expanded onto a floppy diskette. It seemed like [Catch-22], I need a working DOS or OS/2 system to run the update programs to create the diskettes, but need the diskettes to build a working system.
To get around this, I decided to take a "scaffolding" approach. Using DOS 6 bootable floppy, I was able to re-partition the drive with FDISK into two small 1.9GB partitions. I have the full five-floppy IBM DOS 6 set, I hid the first partition for OS/2, and install the DOS 6 GUI on the second partition. I went ahead and added a few new subdirectories: BOOT to hold Grub2, PERSONAL to hold the data I decompress from the floppies, and UTILS to hold additional utilities. This little DOS system worked, and I now have new OS/2 "Disk 1" and "Disk 2" for the installation process.
(If you don't have a full set of DOS installation diskettes, you can make due with "FORMAT C: /S" from a [DOS boot disk], and then just copy over all the files from the boot disk to your C: drive. You won't have a nice DOS GUI, but the command line prompt will be enough to proceed.)
Like DOS, OS/2 expects to be installed on the C: drive. I hid the second partition (DOS), and marked the first partition installable and bootable. The OS/2 installation involves a lot of reboots, and the hard drive is not natively bootable in the intermediate stages. This means having to boot from Disk 0, then putting in Disk 1, then disk 2, before continuing the next phase of the installation. I tried to keep the installation as "Plain Vanilla" as possible.
I had to figure out what to include, and what to exclude, and this involved a lot of trial and error. For example, one of the choices was for "external diskette support". Since I had an "internal diskette drive", I didn't think I needed it. But after a full install, I discovered that it would not read or write floppy diskettes, so it appears that I do indeed need this support.
OS/2 supports two different file systems, FAT16 and the High Performance File System (HPFS). Since my partition was only 1.9GB in size, I chose just to use FAT16. HPFS supported larger disk partitions, longer file names, and faster performance, none of which I need for these purposes.
I thought it would be nice to get TCP/IP networking to work with my Ethernet card. However, after many attempts, I decided against this. I needed to focus on my mission, which was to decompress floppy diskettes. It was amusing to see that OS/2 supported all kinds of networking, including Token Ring, System Management, Remote Access, Mobile Access Services, File and Print.
Once all the options are chosen, OS/2 installation then proceeds to unpack and copy all the programs to the C: drive. During this process, IBM had informational splash screens. Here's one that caught my eye, titled "IBM Means Three Things" that listed three reasons to partner with IBM:
Providing global solutions for a small planet
Creating and Applying advanced technologies to improve with which customers run their businesses
Constantly improving customer service with the products and services we provide
You might wonder how these OS/2 splash screens, written over 10 years ago, can appear almost identical to IBM's current [Smarter Planet] campaign. Actually, it is not that odd. IBM has been keeping to these same core principles since 1911, only the words to describe and promote these core values have changed.
To access both OS/2 and DOS partitions, I installed Grand Unified Bootloader [Grub2] on the DOS partition under C:/BOOT/GRUB directory. However, when I boot OS/2, I cannot see the DOS partition. And when I boot DOS, I cannot see the OS/2 partition. Each operating system thinks its C: drive is the only partition on the system.
Now that I had OS/2 running, I was then able to install Stacker from two floppy diskettes. With this installed, I can compress and decompress data on either the hard disk, or on floppy diskettes. Most of the files were flat text documents and digital photos. After copying the data off the compressed disks onto my hard drive, I now can copy them off to a safe place.
To finish this project, I installed Ubuntu Linux on the remaining 76GB of disk space, which can access both the OS/2 and DOS drives FAT16 file systems natively. This allows me to copy files from OS/2 to DOS or vice versa.
Now that I know what data types are on the diskettes, I determined that I could have decompressed the data in just a few steps:
Set up a DOS partition on C: drive
Insert one of the compressed diskettes into the floppy drive
Copy the STACKER.EXE program from the floppy to the C: drive
Run "STACKER A:" to decompress the floppy diskette
However, now that I have a working DOS and OS/2 system, I can possibly review the rest of my floppy diskettes, some of which may require running programs natively on OS/2 or DOS. This brings me to an important lesson. If you are going to keep archive data for long-term retention, you need to choose file formats that can be read by current operating systems and programs. Installing older operating systems and programs to access proprietary formats can be quite time-consuming, and may not always be possible or desirable.
When I turned on the television last weekend, I saw large waves of water knock down rows of small houses. I thought I had caught the end of a bad Godzilla movie, but sadly it was not movie special effects. Mother Nature can be quite destructive. Over the past four days, Japan has been hit hard by a series of earthquakes and resulting tsunami.
(Note: Disasters can happen anywhere and at any time. Last month, New Zealand had an earthquake as well. It is best to always be prepared. If you haven't done so lately, check out the latest recommendations from the US Government [Ready.Gov] website.)
Several have asked me how this tragedy in Japan might affect IBM and its clients. Here is what I have gathered from various sources. All IBM Japan employees have survived, are safe and reporting no major injuries. IBM has four major facilities, near central part of the country around Tokyo, far from Sendai, the epicenter. All IBM buildings are still standing and operational. A few sections of Tokyo are affected by scheduled brown-outs in an effort to save electricity. Employees are asked to telecommute (a.k.a. work from home) to minimize traffic congestion.
Hakozaki - Headquarters and executive briefing center
Makuhari - Technical Center, where we often hold conferences and other events
Yamato - Research Facility, where R&D is done for IBM tape storage products
Toyosu - Service Delivery Center
I have been to Japan many times throughout my career. Back in the summer of 1995, IBM sent me to Osaka to help out clients in the aftermath of the Great Hanshin eartquake near Kobe. I remember it well, sending an email back to my team saying "It is 1995, and here in Japan it is 95 degrees and 95 percent humidiy." It was seven months after the earthquake, but people were still living in cardboard boxes and make-shift tents.
Many people asked if I will be going back to Japan to help out. I speak Japanese, can make sense of the Japanese Katakana characters on computer monitors, and am an expert in Disaster Recovery. However, the IBM Japan team is doing an awesome job helping our clients restore their data and recovery their business operations. Of course, if IBM needs me in Japan, I will gladly go, but so far, it doesn't seem that I am needed there.
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this." -- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
Well, it's Tuesday again, and you know what that means! IBM announcements!
Today, I am in New York visiting clients. The weather is a lot nicer than I expected. Here is a picture of the Hudson River through some trees with leaves turning color. Something we don't see in Tucson! Our cactus and pine trees stay green year-round!
The announcements today center around the IBM PureSystems family of expert integrated systems. The PureFlex is based on Flex System components. The Flex System chassis is 10U high that hold 14 bays, consisting of 7 rows by 2 columns. Computer and Storage nodes fit in the front, and switches, fans and power supplies in the back. Here is a quick recap:
IBM Flex System Compute Nodes
The x220 Compute Node is a single-bay low-power 2-socket x86 server. The x440 Compute Node is a powerful double-bay (1 row, 2 columns). The p260 Compute Node is a single-bay server based on the latest POWER7+ CPU processor.
IBM Flex System Expansion Nodes
Do you remember those old movies where a motorcycle would have a sidecar that could hold another passenger, or extra cargo? IBM introduces "Expansion Nodes" for the x200 series single-bay Compute nodes. The idea here is that in a single column, you have one bay for the Compute node, and then on the side in the next bay (same column) you have an Expanions node. There are two choices:
Storage Expansion Node allows you to have eight additional drives
PCIe Expansion Node allows to to have four PCIe cards, which could include the SSD-based PCIe cards from IBM's recent acquisition, Texas Memory Systems.
There are times where one or two internal drives are just not enough storage for a single server, and these expanion nodes could just be the perfect solution for some use cases.
IBM Flex System V7000 Storage Node
I saved the best for last! The Flex System V7000 Storage Node is basically the IBM Storwize V7000 repackaged to fit into the Flex System chassis. This means that in the front of the chassis, the Flex System V7000 takes up four bays (2 rows by 2 columns). In the back of the chassis are the power supplies, fans and switches.
The new Flex System V7000 supports everything the Storwize V7000 does except the upgrade to "Unified" through file modules. For those who want to have Storwize V7000 Unified in their PureFlex systems, IBM will continue to offer the outside-the-chassis original Storwize V7000 that can have two file modules added for NFS, CIFS, HTTPS, FTP and SCP protocol support.
IBM Flex System Converged Network Switch
The Converged Network Switch provide Fibre Channel over Ethernet (FCoE) directly from the chassis. This eliminates the need for a separate "Top-of-Rack" switch, and allows the new Flex System V7000 Storage Node to externally virtualize FCoE-based disk arrays.
Patterns of Expertise for Infrastructure
The original patterns of expertise focused on the PureApplication Systems. Now IBM has added some for the Infrastructure on PureFlex systems.
IBM has sold over 1,000 Flex System and PureFlex systems, across 40 different countries around the world, since their introduction a few months ago in April! These latest enhancements will help solidify IBM's industry leadership,
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
This week, Hitachi Ltd. announced their next generation disk storage virtualization array, the Virtual Storage Platform, following on the success of its USP V line. It didn't take long for fellow blogger Chuck Hollis (EMC) to comment on this in his blog post [Hitachi's New VSP: Separating The Wheat From The Chaff]. Here are some excerpts:
"Well, we all knew that Hitachi (through HDS and HP) would be announcing some sort of refresh to their high-end storage platform sooner or later.
As EMC is Hitachi's only viable competitor in this part of the market, I think people are expecting me to say something.
If you're a high-end storage kind of person, your universe is basically a binary star: EMC and Hitachi orbiting each other, with the interesting occasional sideshow from other vendors trying to claim relevance in this space."
Chuck implies that neither Hewlett-Packard (HP) nor Hitachi Data Systems (HDS) as vendors provide any value-add from the box manufactured by Hitachi Ltd. so combines them into a single category. I suspect the HP and HDS folks might disagree with that opinion.
When I reminded Chuck that IBM was also a major player in the high-end disk space, his response included the following gem:
"Many of us in the storage industry believe that IBM currently does not field a competitive high-end storage platform. IDC market share numbers bear out this assertion, as you probably know."
While Chuck is certainly entitled to his own beliefs and opinions, believing the world is flat does not make it so. Certainly, I doubt IDC or any other market research firm has put out a survey asking "Do you think IBM offers a competitive high-end disk storage platform?" Of course, if Chuck is basing his opinion on anecdotal conversations with existing EMC customers, I can certainly see how he might have formed this misperception. However, IDC market share numbers don't support Chuck's assertion at all.
There is no industry-standard definition of what is a "high-end" or "enterprise-class" disk system. Some define high-end as having the option for mainframe attachment via ESCON and/or FICON protocol. Others might focus on features, functionality, scalability and high 99.999+ percent availability. Others insist high-end requires block-oriented protocols like FC and iSCSI, rather than file-based protocols like NAS and CIFS.
For the most demanding mission-critical mix of random and sequential workloads, IBM offers the [IBM System Storage DS8000 series] high-end disk system which connects to mainframes and distributed servers, via FCP and FICON attachment, and supports a variety of drive types and RAID levels. The features that HP and HDS are touting today for the VSP are already available on the IBM DS8000, including sub-LUN automatic tiering between Solid-State drives and spinning disk, called [Easy Tier], thin provisioning, wide striping, point-in-time copies, and long distance synchronous and asynchronous replication.
There are lots of analysts that track market share for the IT storage industry, but since Chuck mentions [IDC] specifically, I reviewed the most recent IDC data, published a few weeks ago in their "IDC Worldwide Quarter Disk Storage Tracker" for 2Q 2010, representing April 1 to June 30, 2010 sales. Just in case any of the rankings have changed over time, I also looked at the previous four quarters: 2Q 2009, 3Q 2009, 4Q 2009 and 1Q 2010.
(Note: IDC considers its analysis proprietary, out of respect for their business model I will not publish any of the actual facts and figures they have collected. If you would like to get any of the IDC data to form your own opinion, contact them directly.)
In the case of IDC, they divide the disk systems into three storage classes: entry-level, midrange and high-end. Their definition of "high-end" is external RAID-protected disk storage that sells for $250,000 USD or more, representing roughly 25 to 30 percent of the external disk storage market overall. Here are IDC's rankings of the four major players for high-end disk systems:
By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM high-end disk outsold both HDS and HP combined. This has been true for the past five quarters. If a smaller start-up vendor has single digit percent market share, I could accept it being counted as part of Chuck's "occasional sideshow from other vendors trying to claim relevance", but IBM high-end disk has consistently had 20 to 30 percent market share over the past five quarters!
Not all of these high-end disk systems are connected to mainframes. According to IDC data, only about 15 to 25 percent of these boxes are counted under their "Mainframe" topology.
Chuck further writes:
"It's reasonable to expect IBM to sell a respectable amount of storage with their mainframes using a protocol of their own design -- although IBM's two competitors in this rather proprietary space (notably EMC and Hitachi) sell more together than does IBM."
The IDC data doesn't support that claim either, Chuck. By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM disk for mainframes outsold all other vendors (including EMC, HDS, and HP) combined. And again, this has been true for the past five quarters. Here is the IDC ranking for mainframe disk storage:
IBM has over 50 percent market share in this case, primarily because IBM System Storage DS8000 is the industry leader in mainframe-related features and functions, and offers synergy with the rest of the z/Architecture stack.
So Chuck, I am not picking a fight with you or asking you to retract or correct your blog post. Your main theme, that the new VSP presents serious competition to EMC's VMAX high-end disk arrays, is certainly something I can agree with. Congratulations to HDS and HP for putting forth what looks like a viable alternative to EMC's VMAX.
To learn more about IBM's upcoming products, register for next week's webcast "Taming the Information Explosion with IBM Storage" featuring Dan Galvan, IBM Vice President, and Steve Duplessie, Senior Analyst and Founder of Enterprise Storage Group (ESG).
Well, it's Tuesday again, and you know what that means! IBM Announcements! Typically, IBM System Storage has three to five major product launches per year. Making announcements every Tuesday would have been two frequent, and having one big announcement every two or three years would be too far apart. Worldwide combined revenues for storage hardware and software grew double digits last year, comparing full-year 2011 to the prior 2010 year, and I am sure that 2012 will also be a good year for IBM as well! This week we have announcements for both disk and tape, but since 2012 is the 60th Diamond Anniversary for tape, I will start with tape systems first.
TS1140 support for JA/JJ tape cartridges
The TS1140 enterprise tape drive was announced at the [Storage Innovation Executive Summit] last May. It supported a new E07 format on three different new tape cartridges. Models "JC" was 4.0TB standard re-writeable tapes, "JY" was 4.0TB WORM tapes, and "JK" were 500GB economy tapes that were less expensive, but offered faster random access.
Generally, IBM has adopted an N-2 read, N-1 write [backward compatibility]. This means that the TS1140 could read E05 and E06 formatted tapes on JB and JX media, and could write E06 format on JB and JX media. However, there are a lot of older JA and JJ media, especially as part of TS7740 environments, so IBM now supports TS1140 drives to read J1A formatted JA and JJ media. This is not just for TS7740 environments, any TS1140 in stand-alone or tape library configurations will support this as well.
TS7700 R2.1 enhancements
IBM is a leader in tape virtualization with or without physical tape as back-end media. There are two hardware models of the [IBM Virtualization Engine TS7700 family] for the IBM System z mainframe. These virtual libraries are referred to as "clusters" in IBM literature.
The TS7740 Virtual Tape Library supports putting virtual tape images on disk first, then move less-active data to physical tape, which I covered in my blog post [IBM Announcements - July 2007].
A unique feature of the TS7700 series is support for a Grid configuration, which allows up to six different TS7700 clusters to be grouped into a single instance image. These clusters can be in local or remote locations, connected via WAN or LAN connections.
R2.1 is the latest software release of this successful IBM's TS7700 series.
True Sync Mode Copy. Before R2.1, the TS7700 offered "immediate mode copy". An application would write to a virtual tape, and when it was done with the tape and performed an unmount, the TS7700 would then replicate the tape contents to a secondary cluster on the grid. With True Sync Mode, data contents are replicated per implicit or explicit SYNC points. This is another IBM first in the IT tape industry.
Remote Mount Fail-over. When you have two or more TS7700 clusters in a grid configuration, you can do remote mounts. We've added fail-over multi-pathing up to four paths, so that if a link to a remote cluster is down, it will try one of the others instead.
Parallel Copies and Pre-Migration. On of my 19 patents is for the pre-migration feature for the IBM 3494 Virtual Tape Server (VTS) that carries forward into the TS7700, and is also used in the SONAS and Information Archive products. However, when the grid architecture was introduced, the engineers decided not to allow pre-migration and copies to secondary clusters to occur concurrently. Now these two operations can be done in parallel.
Merge two grids into one grid. Now that we can support up to six clusters into a single grid, we have people with 2-cluster and 3-cluster grids looking to merge them into one. Of course, all the logical and physical volume serials (VOLSER) must be unique!
Accelerate off JA/JJ Media. There are a lot of older JA and JJ media still in TS7700 libraries. This feature allows customers to speed up the transition to newer physical tape media.
Copy Export to E06 format on JB media. This one is clever, and I have to say I would have never thought about it. Let's say you have a TS7740 with TS1140 drives, but you want to export some virtual tapes to physical media to be sent to someone who only has a TS7740 connected with older TS1130 drives. These older drives can't read new JC media nor make sense of the E07 format. This feature will let you export to older JB media in E06 format so that it will be fully readable at the new location on the TS1130 drives.
Copy Export Merge service offering. Thanks to mergers and acquisitions, it is sometimes necessary to split off a portion of data from a TS7700 grid. In the past, IBM supported sending this export to a completely empty TS7700 library, but this new service offerings allows the export to be merged into an existing TS7700 that already contains data.
LTFS-SDE support for Mac OS X 10.7 Lion
How do people still not yet know about the Linear Tape File System [LTFS]? I mentioned this in my blogs back in 2010 in [April], [September], and [November]. Last year, LTFS was the [NAB Show Pick Hits Award] and an [Emmy] for revolutionizing the use of digital tape in Television broadcasting.
In layman's terms, the Single Drive Edition [LTFS-SDE] allows a tape cartridge to be treated like USB memory stick. It is supported on the LTO5 tape drives for systems running various levels of Windows, Linux and Mac OS X. Prior to this announcement, IBM supported Snow Leopard (10.5.6) and Leopard (10.6), and now supports Mac OS X 10.7 "Lion" release.
IBM first introduced Solid-State Drives (SSD) back in 2007 where it made sense the most, in [drive-for-drive replacements on blade servers in the IBM BladeCenter]. Blade servers typically only have a single drive, and SSD are both faster and use less energy on a drive-for-drive comparison, so this provided immediate benefit. Today, SSD are available on a variety of System x and POWER system servers.
In 2008, IBM rocked the world by being the first to reach [1 Million IOPS with Project Quicksilver]. This was an all-SSD configuration which many considered unrealistic (at the time), but it showed the potential for solid state drives.
When the [XIV Gen3 was Announced - July 2011], each module included an 1.8-inch "SSD-Ready" slot in the back. IBM made a Statement of Direction that IBM would someday offer SSD drives to put in these slots. Today's announcement is that IBM has finalized the qualification process, so now XIV Gen3 clients can have 400GB of usable non-volatile SSD read cache added to each module. This SSD can be added to existing XIV Gen3 boxes in the field, or it can be factory-installed in new shipments. If you have a 15-module XIV, that's 6TB of additional read cache! This SSD is entirely managed by the XIV Gen3, so you won't have to spend weeks reading manuals or specifying configuration parameters.
When you carve volumes on the XIV, you now have an option to enable or disable use of the SSD cache for each volume. Since XIV is being used in private and public cloud deployments, this offers the ability to offer premium performance at premium prices. The use of SSD is complementary to IBM XIV Quality of Service (QoS) performance levels, which are determined by host instead.
Well, that's the first major IBM System Storage launch of 2012. Let me know what you think in the comment section below.
Did you miss IBM Pulse 2013 this week? I wasn't there either, having scheduled visits with clients in Washington DC this week, only to have those meetings cancelled due to the [U.S. sequestration cuts].
Fortunately, there are plenty of videos and materials to review from the event. Here's a [12-minute video] interview between Laura DuBois, Program VP of Storage for industry analyst firm [IDC], and fellow IBM executive Steve "Woj" Wojtowecz, VP of Tivoli Storage and Networking Software.
(Update: Apparently, IBM had not secured re-distribution rights from IDC to post this video prior to my blog post. IBM now has full permission to distribute. My apologies for any inconvenience last week.)
The two discuss client opportunities and requirements for storage clouds and compute clouds. Client cloud storage requirements include backup and archive clouds, file storage clouds, and storage that supports compute cloud environments.
In my presentations in Australia and New Zealand, I mentioned that people were re-discovering the benefits of removable media. While floppy diskettes were convenient way of passing information from one person to another, they unfortunately did not have enough capacity. In today's world, you may need Gigabytes or Terabytes of re-writeable storage with a file system interface that can easily be passed from one person to another. In this post, I explore three options.
(FCC Disclaimer: I work for IBM, and IBM has no business relationship with Cirago at the time of this writing. Cirago has not paid me to mention their product, but instead provided me a free loaner that I promised to return to them after my evaluation is completed. This post should not be considered an endorsement for Cirago's products. List prices for Cirago and IBM products were determined from publicly available sources for the United States, and may vary in different countries. The views expressed herein may not necessarily reflect the views and opinions of either IBM or Cirago.)
I took a few photos so you can see what exactly this device looks like. Basically, it is a plastic box that holds a single naked disk drive. It has four little rubber feet so that it does not slip on your desk surface.
The inside is quite simple. The power and SATA connections match those of either a standard 3.5 inch drive, or the smaller form factor (SFF) 2.5 inch drive. However, to my dismay, it does not handle EIDE drives which I have a ton of. After taking apart six different computer systems, I found only one had SATA drives for me to try this unit out with.
The unit comes with a USB cable and AC/DC power adapter. In my case, I found the USB 3.0 cable too short for my liking. My tower systems are under my desk, but I like keeping docking stations like this on the top of the desk, within easy reach, but that wasn't going to happen because the USB cable was not long enough.
Instead, I ended up putting it half-way in between, behind my desk, sitting on another spare system. Not ideal, but in theory there are USB-extension cables that probably could fix this.
Here it is with the drive inside. I had a 3.5 inch Western Digital [1600AAJS drive] 160 GB, SATA 3 Gbps, 8 MB Cache, 7200 RPM.
To compare the performance, I used a dual-core AMD [Athlon X2] system that I had built for my 2008 [One Laptop Per Child] project. To compare the performance, I ran with the drive externally in the Cirago docking station, then ran the same tests with the same drive internally on the native SATA controller. Although the Cirago documentation indicated that Windows was required, I used Ubuntu Linux 10.04 LTS just fine, using the flexible I/O [fio] benchmarking tool against an ext3 file system.
Sequential Write - a common use for external disk drive is backup.
Random read - randomly read files ranging from 5KB to 10MB in size.
Random mixed - randomly read/write files (50/50 mix) ranging from 5KB to 10MB in size.
Random Mixed (50/50)
Latency (msec) read
Latency (msec) write
Bandwidth (KB/s) read
Bandwidth (KB/s) write
For sequential write, the Cirago performed well, only about 15 percent slower than native SATA. For random workloads, however, it was 30-40 percent slower. If you are wondering why I did not get USB 3.0 speeds, there are several factors involved here. First, with overheads, 5 Gbps USB 3.0 is expected to get only about 400 MB/sec. My SATA 2.0 controller maxes out at 375 MB/sec, and my USB 2.0 ports on my system are rated for 57 MB/sec, but with overheads will only get 20-25 MB/sec. Most spinning drives only get 75 to 110 MB/sec. Even solid-state drives top out at 250 MB/sec for sustained activity. Despite all that, my internal SATA drive only got 16 MB/sec, and externally with the Cirago 14 MB/sec in sustained write activity.
Here is the mess that is inside my system. The slot for drive 2 was blocked by cables, memory chips and the heat sink for my processor. It is possible to damage a system just trying to squeeze between these obstacles.
However, the point of this post is "removable media". Having to open up the case and insert the second drive and wire it up to the correct SATA port was a pain, and certainly a more difficult challenge than the average PC user wishes to tackle.
Price-wise, the Cirago lists for $49 USD, and the 160GB drive I used lists for $69, so the combination $118 is about what you would pay for a fully integrated external USB drive. However, if you had lots of loose drives, then this could be more convenient and start to save you some money.
IBM RDX disk backup system
Another problem with the Cirago approach is that the disk drives are naked, with printed circuit board (PCB) exposed. When not in the docking station, where do you put your drive? Did you keep the [anti-static ESD bag] that it came in when you bought it? And once inside the bag, now what? Do you want to just stack it up in a pile with your other pieces of equipment?
To solve this, IBM offers the RDX backup system. These are fully compatible with other RDX sytems from Dell, HP, Imation, NEC, Quantum, and Tandberg Data. The concept is to have a docking station that takes removable, rugged plastic-coated disk-enclosed cartridges. The docking station can be part of the PC itself, similar to how CD/DVD drives are installed, or as a stand-alone USB 2.0 system, capable of processing data up to 25 MB/sec.
The idea is not new, about 10 years ago we had [Iomega "zip" drives] that offered disk-enclosed cartridges with capacities of 100, 250 and 750MB in size. Iomega had its fair share of problems with the zip drive, which were ranked in 2006 as the 15th worst technology product of all time, and were eventually were bought out by EMC two years later (as if EMC has not had enough failures on its own!)
The problem with zip drives was that they did not hold as much as CD or DVD media, and were more expensive. By comparison, IBM RDX cartridges come in 160GB to 750GB in size, at list prices starting at $127 USD.
IBM LTO tape with Long-Term File System
Removable media is not just for backup. Disk cartridges, like the IBM RDX above, had the advantage of being random access, but most tape are accessed sequentially. IBM has solved this also, with the new IBM Long Term File System [LTFS], available for LTO-5 tape cartridges.
With LFTS, the LTO-5 tape cartridge now can act as a super-large USB memory stick for passing information from one person to the next. The LTO-5 cartridge can handle up to 3TB of compressed data at up to SAS speeds of 140 MB/sec. An LTO-5 tape cartridge lists for only $87 USD.
The LTO-5 drives, such as the IBM [TS2250 drive] can read LTO-3, LTO-4 and LTO-5cartridges, and can write LTO-4 and LTO-5 cartridges, in a manner that is fully compatible with LTO drives from HP or Quantum. LTO-3, LTO-4 and LTO-5 cartridges are available in WORM or rewriteable formats. LTO-4 and LTO-5 cartridges can be encrypted with 256-bit AES built-in encryption. With three drive manufacturers, and seven cartridge manufacturers, there is no threat of vendor lock-in with this approach.
These three options offer various trade-offs in price, performance, security and convenience. Not surprisingly, tape continues to be the cheapest option.
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
In this case, it is not chess pieces, but FUD being slung around like mud between vendors. EMC blogger Chuck Hollis' post [Products vs. Features] correctly pointsout that IBM has invented most nearly everything useful in IT, and sadly a few things we wish we hadn't.Gene Amdahl, who left IBM to start his own company, is credited for coining the phrase describing IBM'sinnovative sales techniques. Wikipedia has a nice write up on the history of[Fear, Uncertainty and Doubt(FUD)].
Nowadays, when you hear "FUD" most storage administrators immediately think of EMC, who have taken this method to anew level of art-form. Take for example two EMC entries from fellow blogger BarryB, on his Storage Anarchist blog:[Not Dead Yet, andPushing Daisies].The first is a reference to a funny scene from a Monty Python movie, and the second one is referring to a terriblenew television program called "Pushing Daisies". (In this show, the main character can bring a dead personback to life for sixty seconds, just long enough to ask a few questions on behalf of his detective friend. He must touch the person again within 60 seconds, or someone else randomly dies instead. I amnot a fan of this concept, and found it a bit morbid and creepy. But I digress.)
It is true I was on vacation the past two weeks, but this was group travel I booked over six months ago before we had the exact dates lined up for our various announcements, and not a last-minute celebration of my recent new job assignment. I got all my assignments for this announcement turned in before leaving for my trip. I never thought of checking with fellow IBM blogger BarryW to make sure that we don't have overlapping vacation schedules, leaving the "blogosphere" unmanned, so to speak, but it is not a bad idea. Fortunately, our IBM PR team was able to make their rebuttal through other means. You can read the recap on Techworld [Marketing Wars by Proxy].
Several astute readers on my blog, however, requested that I add my two cents. Let's take a look at some of BarryB's comments:
...most DS8300's are to this day most frequently bundled as "free" storage with IBM mainframe and server sales.
We just shipped our 15,000th box, so for this absurd statement to be true, more than half would have to be given away as part of a server-and-storage deal?Actually, about a third of our DS8000 sales are sold with servers in the same bundle, and while we do provide discounts from the official list price, that is not the same as "free". The other two thirds are sold into accounts to be used with the existing servers already deployed. So BarryB, your math doesn't work out. (Perhaps you've been taking Hitachi math lessons???)
It is interesting however, that when we do a 4-year TCO comparison, between a normally-discounted DS8000 versus free EMC DMX4 hardware, IBM still has the lower cost, given that most of the price-gouging from EMC happens after the initial sale, through software features, annual Powerpath renewals and MES upgrades. If you are an EMC customer, and you are planning to add more capacity to your DMX, ask EMC to charge you no more than what you originally paid on a dollar-per-GB basis for the initial capacity. That's only fair, right?
...No thin provisioning, or even a commitment to thin provisioning. Just crickets. (Celerra support since Jan 2006...
EMC DMX does not have thin provisioning available today either, so BarryB brings up Celerra, their NAS box? IBM System Storage N series NAS box also has thin provisioning, so if you want thin provisioning you can buy a NAS box from EMC or IBM. Thin provisioning makes sense using NAS protocols, as there are actual commands to "delete a file" that can then free up the related blocks in a thin-provisioned environment. The only way to do this with block-oriented protocols is to get the OS to notify the storage device that blocks can be freed up. As it turns out, IBM's z/OS has such support, which we developed specifically for our thin-provisioning support in our IBM RAMAC Virtual Array disk systems back in the 1990s.For block-oriented devices on most other operating systems, thin provisioning may not be all that it is cracked up to be.
No SATA drives (only DMX-4 supports native SATA-II drives, since Aug’07)
A few people are confused on this. IBM DS8000 has supported FATA for quite some time now, same slower speeds and higher capacities as SATA, but are technically NOT the same as SATA. FATA are designed to provide better protection against vibrational shock, to improve reliability of the drives. IBM felt that if the data was important enough to put on a high-end system, it should get better-than-SATA treatment. If you really want SATA, try our IBM System Storage N series, DS4000 or DS3000 models.
No RAID 6 (DMX-3 has supported multi-dimensional RAID since Q1’07, DMX-4 since Aug'07, ...
IBM N series supports RAID6, but we called it RAID-DP and that confused some people. Same thing, DP stands for Dual Parity, protecting against a double-disk failure. We also just announced RAID6 on our DS4000 series, by the way.
No 4Gb back-end (USP-V since May '07, DMX-4 since Aug’07)
I found this one odd, since BarryB himself in an earlier post explained why 4Gbps back-end made no difference to DMX4 performance in this post [DMX-4 and Oh So Much More], which I will put into a different color so you can tell it is from a different post:
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s. Transmit times are really only a tiny portion of I/O overhead, and just don't make that much difference when a massively-cached system is pre-fetching reads, buffering/delaying writes and reordering I/O requests to minimize seek times. Not that 4Gb/s won't help some applications, but most people just won't see any noticeable difference.
In this case, BarryB is right. The IBM DS8000's 2Gbps back-end is not a performance bottleneck. The DS8000 with a 2Gbps back-end is faster than DMX4 with a 4Gbps back-end for business application workloads. EMC doesn't publish SPC benchmarks to deny this, so you will just have to take our word on this.
Still only 1024 maximum disk drives (DMX-3 & 4 support up to 2400 drives, USP-V supports 1152)
I would be curious to see how many customers have more than 1024 drives on any high-end disk array.As we learned back in [Day 2 Storage Symposium], the average DS8100 has 17.4 TB, and DS8300 has 41.5 TB capacity. Using 500GB drives,that's only 83 spindles. Even with 73GB drives, that's 568 spindles. Plenty of room for growth, so I am notconvinced that higher theoretical upper architectural limits are worth discussing here.
Still only two HARD LPARs (partitions) ..., and even IBM’s mid-tier products support more than 2 storage partitions (in this same announcement)
IBM's two LPARs are TWICE what EMC DMX offers. I don't even know why anyone from EMC would bring this up? While EMC is enjoying their success with VMware, the lack the experience to carry this over to their storage lines. Until EMC offers MORE THAN TWO of any kind of partitions on their high-end offerings, there just is no credibility here. As for our "storage partitions" on our DS4000 line, that is an unfortunate mis-understanding of the press release. On the DS4000, the term "storage partition" is really "LUN masking", dividing up only which disks can be accessed by which hosts, and not dividing up any processor or cache capacity. So this is not the same as any LPAR concept on any other system. For example, a DS4000 with 64 partitions can be attached to 64 hosts, or 64 host-clusters like a Windows MSCS environment or AIX HACMP.
No native Ethernet replication or iSCSI support (Symmetrix has had since 2002)
Again, I found this one odd. On another EMC post, [Vigorous Debates],Chad Sakac mentions that only 2% of Symmetrix are sold with IP ports, not sure if this is for Ethernet replication, iSCSI attachment, or both (Again, I will use a different color):
On the Symm business (a huge part of EMC’s business – the IP ports are included on 2% of deals. That’s a fact.
Just because engineer can put a feature or function on a box, doesn't mean there is business sense to do so. I would hate for IBM to invest millions of dollars on native iSCSI support, only to have 2% of our DS8000 boxes sold with that feature. Customers who have DS8000 on FC SANs already deployed can easily add iSCSI support either through their SAN switches, or by fronting the DS8000 with an N series gateway. Most customers looking for native iSCSI are the smaller no-SAN-deployed SMB customers, and for them, we have both the DS3300 and the various N series models to choose from.
Well that's my two cents. The DS8000 series remains a strategic part of the IBM System Storage offering matrix, with continued investment in the development, as well as on-going research that we can leverage throughout the IBM company. I would like to read your thoughts on this, post me a comment below.
This week I was aboard the Queen Mary in Long Beach, California! This was a business event organized by [Key Info Systems], a valued IBM Business Partner. Key Info resells IBM servers, storage and switches.
The Queen Mary retired in 1967, and has been converted into a hotel and events venue. The locals just parked their car and walked on board, but I got to stay Tuesday through Thursday in one of the cabins. It was long and narrow, with round windows! There were four dials for the bathtub: Cold Salt, Hot Fresh, Cold Fresh, and Hot Salt.
Stepping on the boat was like walking back in time through history! If you decide to go see it, check out the [Art Deco bar at the front of the Promenade deck. The ship is still in the water, but is permanently docked. It is sectioned off to prevent the ocean waves from affecting it, so we did not have the nauseous moving back and forth normally associated with cruise ships.
(It is with a bit of irony that we are on the Queen Mary just days after the tragedy of the [Costa Concordia], the largest Italian cruise ship that ran aground near Isola de Giglio. The captain will have to explain how he [fell into a lifeboat] before he had a chance to wait for everyone else to get safely off the shipwreck. He was certainly no [Captain Sulley]! I am thankful that most of the 4,200 people survived the incident.)
Lief Morin, Founder and Chief Executive for Key Info Systems, kicked off the meeting with highlights of 2011 successes. I have known Lief for years, as Key Info comes to the Tucson EBC on a frequent basis. This event was designed to give his sellers an update of what is the latest for each product line, and what to look forward to in the next 12-18 months.
The next speaker was from Vision Solutions that provides High Availability solutions for IBM i on Power Systems. In 2010, their company nearly doubled in size with the acquisition of Double-Take, which provides data replication for x86 servers running Windows, Linux, VMware, Hyper-V and other hypervisors. The capabilities of Double-Take sounded similar to what IBM offers with [Tivoli Storage Manager FastBack] and [Tivoli Storage Manager for Virtual Environments].
Dinner at Sir Winston's
Rather than take the "Ghosts and Legends" tour, I opted for dinner at the Queen Mary's signature restaurant, Sir Winston's. This is a fancy place, so dress accordingly. If you want the Raspberry soufflé, order it early as it takes 30 minutes to prepare!
[Storwize V7000], including the new Storwize V7000 Unified configuration
Storage is an important part of the Key Info Systems revenue stream, so I was glad to have lots of questions and interactions from the audience.
Murder Mystery Dinner
The acting troupe from [Dinner Detective] put on quite the show for us! With all that is going on in the world, it is good to laugh out loud every now and then.
In other murder mystery dinners I have participated in, each person is assigned a "character" and given a script of what to say and when to say it. This was different, we got to pick our own characters. I chose "Doctor Watson", from the Sherlock Holmes series. Several attendees thought it was a double meaning with [IBM Watson], the computer that figured out the clues on Jeopardy! television game show, and has since been [put to work at Wellpoint] to help out the Healthcare industry.
After the "murder" happened, two actors portraying policemen selected members of the audience to answer questions. We didn't get a script of what to say, so everyone had to "ad lib". I was singled out as a suspect, and had fun playing along in character. One of the attendees afterwards said he was impressed that I was able to fabricate such amusing and elaborate responses to their personal and embarassing questions. As a public speaker for IBM, I have had a lot of practice thinking quickly on my feet.
Fibre Channel and Ethernet Switches
The next two speakers gave us an update on Fibre Channel and Ethernet switches, and their thoughts on the inevitability of Fibre Channel over Ethernet (FCoE). One of the exciting new developments is the [Brocade Network Subscription] which creates a flexible pay-per-use Ethernet port rental model for customers. This is especially timely given the Financial Accounting Standards Board proposed [FASB Change 13] that affects operating leases in the balance sheet.
With the Brocade Network Subscription, you pay monthly for the ports you are using. Need more ports, Brocade will install the added gear. Use fewer ports, Brocade will take the equipment back. There is no term endpoint or residual value like tradtional leasing, so when you are done using the equipment, give it back any time. This is ideal for companies that may need to have a lot of Ethernet ports for the next 2-3 years, but then plan to taper down, and don't want to get stuck with a long-term commitment or capital depreciation.
The last speaker was from VMware. IBM is the #1 reseller of VMware, and VMware commands an impressive 81 percent marketshare in the x86 virtualization space. The speaker presented VMware's strategy going forward, which aligns well with IBM's own strategy, to help companies Cloud-enable their existing IT infrastructures, in preparation for eventual moves to Hybrid or Public cloud deployments.
Special thanks to Lief Morin for sponsoring this event, Raquel Hernandez from IBM for coordinating my travel, and Pete, Christina and Kendrell from Key Info Systems for organizing the activities!
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
I have arrived safely to San Francisco, and was able to check-in at the hotel, pick up my registration badge for Oracle OpenWorld 2011, and attend the first keynote session. This is the largest Oracle OpenWorld event to-date, with over 45,000 attendees from 117 different countries. There are 520,000 square feet of exhibition floor, and over 2,400 educational sessions. The conference is spread across the different buildings of the Moscone center, as well as nearby hotels. On average, attendees will walk seven miles during the week.
Larry Ellison was the keynote speaker for this first kick-off session. He focused almost exclusively on server and storage hardware. He feels that business is all about moving data, not doing integer math.
At the beginning of 2011, Oracle had only sold about 1,000 Exadata, but they have a sales target to sell an additional 3,000 Exadata boxes by year end.
The Exadata offers up to 10x columnar compression, and has 10x faster bandwidth (40Gbps Infiniband versus 4Gbps FCP). If you have a 100TB database, it would take up only 10TB of disk with this approach. He claims that the 90TB of disk you don't have to buy can then be used to buy more DRAM and/or Flash SSD.
(Realistically, since SSD is 15x more expensive than spinning disk, you can only purchase about 6TB of Flash for the 90TB you save on disk!)
Larry claims the design point for Exadata and Exalogic was to offer a system that was more powerful than IBM's fastest P795 computer, but cheaper than commodity x86 hardware. His secret is to "Parallel everything" for faster performance, and no single points of failure (SPOF). Exadata offers up to 10-50x faster query, and 4-10x faster OLTP. To keep costs low, Exadata uses all commodity hardware except the Infiniband. He cited various customer examples:
A company replaced 36 Teradata with 3 Exadata and result was application was 8x faster.
Banco Chile 9x faster than previous system
Deutsche Post 60x faster
Sogetti gets 60x faster backups.
French bank BNP Paribas 17x faster and no change to applications.
Proctor & Gamble 18x faster
Merck 5x faster
Turkcell 250TB compressed to 25TB, 10x faster
The problem was that in each example, he said what it was compared against was the old previous system, which varies and could have been an older Sun system, or an old system from HP, IBM or Dell. Perhaps it was a freudian slip, but Larry mistakenly said "Paralyze" your applications, when he probably meant to "Parallelize".
Of all their 380,000 Oracle customers, 70 percent have SPARC/Solaris and/or Linux. Last week, Oracle announced the new SPARC-T4, which Larry claimed was 5x faster than the previous SPARC-T3. Larry feels that for the first time ever, a non-IBM CPU can challenge the long-standing rein of the IBM POWER series processor. Larry admitted that the IBM POWER7 chip actually did some tasks faster than the SPARC-T4, so his work is not yet done, but they plan to offer a new SPARC-T5 next year that will be 2x better than the SPARC-T4.
Larry compared the I/O bandwidth of serv ers based on SPARC-T4, compared to POWER7, and found that the SPARC-T4 has double the I/O bandwidth, for a cost that was only about 1/4 the cost of a mainframe. IBM offers both. POWER7-based servers for CPU-intensive workloads, and System z (S/390)-based systems for I/O-intensive workloads. Larry feels that even though POWER7 is superior than SPARC-T4 for mathematical calculations, all business applications are focused on I/O-bandwidth to move data, not computations.
Larry claims the new SPARC-T4 can do 1.2 million IOPS. He uses 40 Gbps Infiniband instead of traditional SAN-attached FCP solutions.
A new "box" called Exalytics, combines their commodity hardware platform with a hueristic adaptive in-memory cache, their latest "me-too" solution that compares with what IBM already offers in [IBM SolidDB]. In fact, their me-too is not even internally developed, but rather the result of an acquisition of a company called "Times Ten". I thought it was interesting that the only piece of Oracle software mentioned during Larry's 90-minute speach, was this piece of acquired technology. The new Exalytics product run on a small rack and grow, analyzing relational data, non-relational OLAP, as well as unstructured documents. The result is what Larry called "the Speed of Light".
He also mentioned that Bob Shimp would kick-off the Cloud later in the week. Given that Larry himself thought that Cloud was a stupid, over-marketed term that nobody has deployed over the past few years, to a complete believer, claiming that over 20 live demos will be given this year on Cloud.
Perhaps the funniest quote was his motivation to use Infiniband as the interconnect
"Ethernet was invented by Xerox when I was a child."
-- Larry Ellison
Here are some sessions that IBM is featuring on Monday. Note the first two are Solution Spotlight sessions at the IBM Booth #1111 where I will be most of the time.
IBM Cloud Computing Solutions for Oracle
10/03/11, 10:30 a.m. – 11:00 a.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Chuck Calio,Technical Strategist, IBM Systems & Technology Group
IBM is recognized in the IT industry as one of the "Big 6" cloud providers, along with Amazon, Google, Microsoft, Salesforce and Yahoo. This session will highlight how IBM Cloud offerings apply to Oracle applications.
Lowering Cost and increasing efficiency in your long term support of Oracle EPM and BI
10/03/11, 3:00 p.m. -- 3:30 p.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Matthew Angelstad, IBM Global Business Solutions - Oracle EPM (Hyperion) Practice Lead
In 2007, Oracle acquired Hyperion, a leading provider of performance management software. This session will show how IBM helps Oracle clients unify Enterprise Performance Management (EPM) and Business Intelligence (BI) in a cost-effective manner, supporting a broad range of strategic, financial and operational management processes.
Application Strategy: Charting the Course for Maximum Business Value
10/03/11, 3:30 p.m. – 4:30 p.m., OpenWorld session #39061
Presenter: Mike Marchildon, IBM
The industry is undergoing a shift from single Enteprise Resource Planning (ERP) application to second-generation platforms containing diverse yet interdependent systems. This shift presents opportunities and challenges for both IT and the business.
I'm down here in Australia, where the government is a bit stalled for the past two weeks at the moment, known formally as being managed by the [Caretaker government]. Apparently, there is a gap between the outgoing administration and the incoming administration, and the caretaker government is doing as little as possible until the new regime takes over. They are still counting votes, including in some cases dummy ballots known as "donkey votes", the Australian version of the hanging chad. Three independent parties are also trying to decide which major party they will support to finalize the process.
While we are on the topic of a government stalled, I feel bad for the state of Virginia in the United States. Apparently, one of their supposedly high-end enterprise class EMC Symmetrix DMX storage systems, supporting 26 different state agencies in Virginia, crashed on August 25th and now more than a week later, many of those agencies are still down, including the Department of Motor Vehicles and the Department of Taxation and Revenue.
Many of the articles in the press on this event have focused on what this means for the reputation of EMC. Not surprisingly, EMC says that this failure is unprecedented, but really this is just one in a long series of failures from EMC. It reminds me of the last time EMC had a public failure with a dual-controller CLARiiON a few months ago that stopped another company from their operations. There is nothing unique in the physical equipment itself, all IT gear can break or be taken down by some outside force, such as a natural disaster. The real question, though, is why haven’t EMC and the State Government been able to restore operations many days after the hardware was fixed?
In the Boston Globe, Zeus Kerravala, a data storage analyst at Yankee Group in Boston, is quoted as saying that such a high-profile breakdown could undermine EMC’s credibility with large businesses and government agencies. “I think it’s extremely important for them,’’ said Kerravala. “When you see a failure of this magnitude, and their inability to get a customer like the state of Virginia up and running almost immediately, all companies ought to look at that and raise their eyebrows.’’
Was the backup and disaster recovery solution capable of the scale and service level requirements needed by vital state
agencies? Had they tested their backups to ensure they were running correctly, and had they tested their recovery plans? Were they monitoring the success of recent backup operations?
Eventually, the systems will be back up and running, fines and penalties will be paid, and perhaps the guy who chose to go with EMC might feel bad enough to give back that new set of golf clubs, or whatever ridiculously expensive gift EMC reps might offer to government officials these days to influence the purchase decision making process.
(Note: I am not accusing any government employee in particular working at the state of Virginia of any wrongdoing, and mention this only as a possibility of what might have happened. I am sure the media will dig into that possibility soon enough during their investigations, so no sense in me discussing that process any further.)
So what lessons can we learn from this?
Lesson 1: You don't just buy technology, you also are choosing to work with a particular vendor
IBM stands behind its products. Choosing a product strictly on its speeds and feeds misses the point. A study IBM and Mercer Consulting Group conducted back in 2007 found that only 20 percent of the purchase decision for storage was from the technical capabilities. The other 80 percent were called "wrapper attributes", such as who the vendor was, their reputation, the service, support and warranty options.
Lesson 2: Losing a single disk system is a disaster, so disaster recovery plans should apply
IBM has a strong Business Continuity and Recovery Services (BCRS) services group to help companies and government agencies develop their BC/DR plans. In the planning process, various possible incidents are identified, recovery point objectives (RPO) and recovery time objectives (RTO) and then appropriate action plans are documentede on how to deal with them. For example, if the state of Virginia had an RPO of 48 hours, and an RTO of 5 days, then when the failure occurred on August 25, they could have recovered up to August 23 level data(48 hours prior to the incident) and be up and running by August 30 (five days after the incident). I don't personally know what RPO and RTO they planned for, but certainly it seems like they missed it by now already.
Lesson 3: BC/DR Plans only work if you practice them often enough
Sadly, many companies and government agencies make plans, but never practice them, so they have no idea if the plans will work as expected, or if they are fundamentally flawed. Just as we often have fire drills that force everyone to stop what they are doing and vacate the office building, anyone with an IT department needs to practice BC/DR plans often enough so that you can ensure the plan itself is solid, but also so that the people involved know what to do and their respective roles in the recovery process.
Lesson 4: This can serve as a wake-up call to consider Cloud Computing as an alternative option
Are you still doing IT in your own organization? Do you feel all of the IT staff have been adequately trained for the job? If your biggest disk system completely failed, not just a minor single or double drive failure, but a huge EMC-like failure, would your IT department know how to recover in less than five days? Perhaps this will serve as a wake-up call to consider alternative IT delivery options. The advantage of big Cloud Service Providers (Microsoft, Google, Yahoo, Amazon, SalesForce.com and of course, IBM) is that they are big enough to have worked out all the BC/DR procedures, and have enough resources to switch over to in case any individual disk system fails.
On Wikibon, David Floyer has an article titled [SAS Drives Tier 1 to New Levels of Green] that focuses on the energy efficiency benefits of newer Serial-Attach SCSI (SAS) drives over older Fibre Channel (FC) drives. This makes sense, as R&D budgets have been spent on making newer technologies more "green".
Of course, people might consider this an [apples-to-oranges] comparison. Not only are we changing from FC to SAS technology, we are also changing from 3.5-inch drives to small form factor (SFF) 2.5-inch drives. It seems odd to specify 2000 drives, when only two of the five scale up to that level. Few systems in production, from any vendor, have more than 1000 drives, so it would have seemed that would have been a fairer comparison.
However, Hu's conclusion that the combination of SAS and SFF provides better performance and energy efficiency for both IBM DS8800 and HDS VSP than FC-based alternatives from any vendor seems reasonably supported by the data.
Meanwhile, fellow blogger David Merrill (HDS) pokes fun at IBM DS8800 in Figure 2 in his post [Winner o’ the green]. This second comparison was for 4PB of raw capacity, which 4 of the 5 can handle easily using 2TB SATA drives, but the DS8800 is based on SAS technology and does not support 2TB SATA drives. A performance-oriented configuration with four distinct DS8800 boxes employing 600GB SAS drives is used instead, causing the data for the DS8800 to stick out like a sore thumb, or perhaps more intentionally as a middle finger.
The main take-away here is that IBM offers both the DS8700 for capacity-optimized workloads, and the DS8800 for performance-optimized workloads. Some competitors may have been spreading FUD that the DS8700 was withdrawn last month, it wasn't. As you can see from the data presented, there are times where a DS8700 might be more preferable than a DS8800, depending on the type of workloads you plan to deploy. IBM offers both, and will continue to support existing DS8700 and DS8800 units in the field for many years to come.
Bill Bauman, IBM System x Field Technical Support Specialist and System x University celebrity, presented the differences between Grid, SOA and Cloud Computing. I thought this was an odd combination to compare and contrast, but his presentation was well attended.
Grid - this is when two or more independently owned and managed computers are brought together to solve a problem. Some research facilities do this. IBM helped four hospitals connect their computers together into a grid to help analyze breast cancer. IBM also supports the [World Community Grid] which allows your personal computer to be connected to the grid and help process calculations.
SOA - SOA, which stands for Service Oriented Architecture, is an approach to building business applications as a combination of loosely-coupled black-box components orchestrated to deliver a well-defined level of service by linking together business processes. I often explain SOA as the the business version of Web 2.0. You can download a free copy of the eBook "SOA for Dummies" at the [IBM Smart SOA] landing page.
Cloud - A Cloud is a dynamic, scalable, expandable, and completely contractible architecture. It may consist of multiple, disparate, on-premise and off-premise hardware and virtualized platforms hosting legacy, fully installed, stateless, or virtualized instances of operating systems and application workloads.
Tom Vezina, IBM Advanced Technical Sales Specialist, presented "Chaos to Cloud Computing". Survey results show that roughly 70 percent of cloud spend will be for private clouds, and 30 percent for public, hybrid or community clouds. Of the key motivations for public cloud, 77 percent or respondents cited reducing costs, 72 percent time to value, and 50 percent improving reliability.
Tom ran over 500 "server utilization" studies for x86 deployments during the past eight years. Of these, the worst was 0.52 percent CPU utilization, the best was 13.4 percent, and the average was 6.8 percent. When IBM mentions that 85 percent of server capacity is idle, it is mostly due to x86 servers. At this rate, it seems easy to put five to 20 guest images onto a machine. However, many companies encounter "VM stall" where they get stuck after only 25 percent of their operating system images virtualized.
He feels the problem is with the fact most Physical-to-Virtual (P2V) migrations are manual efforts. There are tools available like Novell [PlateSpin Recon] to help automate and reduce the total number of hours spent per migration.
System x KVM Solutions
Boy, I walked into this one. Many of IBM's cloud offerings are based on the Linux hypervisor called Kernel-based Virtual Machine [a href="http://www.linux-kvm.org/page/Main_Page">KVM] instead of VMware or Microsoft Hyper-V. However, this session was about the "other KVM": keyboard video and mouse switches, which thankfully, IBM has renamed to Console Managers to avoid confusion. Presenters Ben Hilmus (IBM) and Steve Hahn (Avocent) presented IBM's line of Local Console Managers (LCM) and Global Console Managers (GCM) products.
LCM are the traditional KVM switches that people are familiar with. A single keyboard, video and mouse can select among hundreds of servers to perform maintenance or check on status. GCM adds KVM-over-IP capabilities, which means that now you can access selected systems over the Ethernet from a laptop or personal computer. Both LCM and GCM allow for two-level tiering, which means that you can have an LCM in each rack, and an LCM or GCM that points to each rack, greatly increasing the number of servers that can be managed from a single pane of glass.
Many severs have a "service processor" to manage the rest of the machine. IBM RSA II, HP iLO, and Dell DRAC4 are some examples. These allow you to turn on and off selected servers. IBM BladeCenter offers an Management Module that allows the chassis to be connected to a Console Manager and select a specific blade server inside. These can also be used with VMware viewer, Virtual Network Computing (VNC), or Remote Desktop Protocol (RDP).
IBM's offerings are unique it that you can have an optical CD/DVD drive or USB external storage attached at the LCM or GCM, and make it look like the storage is attached to the selected server. This can be used to install or upgrade software, transfer log files, and so on. Another great use, and apparently the motivation for having this session in the "Federal Track", is that the USB can be used to attach a reader for a smart card, known as a Common Access Card [CAC] used by various government agencies. This provides two-factor authentication [TFA]. For example, to log into the system, you enter your password (something you know) and swipe your employee badge smart card (something you have). The combination are validated at the selected server to provide access.
I find it amusing that server people limit themselves to server sessions, and storage people to storage sessions. Sometimes, you have to step "outside your comfort zone" and learn something new, something different. Open your eyes and look around a bit. You might just be surprised what you find.
(FTC note: I work for IBM. IBM considers Novell a strategic Linux partner. Novell did not provide me a copy of Platespin Recon, I have no experience using it, and I mention it only in context of the presentation made. IBM resells Avocent solutions, and we use LCM gear in the Tucson Executive Briefing Center.)
It's Tuesday, and you know what that means? IBM Announcements! This week I am in beautiful Orlando, Florida for the [IBM Systems Technical University] conference.
This week, IBM announced its latest tape offerings for the seventh generation of Linear Tape Open (LTO-7), providing huge gains in performance and capacity.
For capacity, the new LTO-7 cartridges can hold up to 6TB native capacity, or 15TB effective capacity with 2.5x compression that for typical data. That is 2.4x larger than the 2.5TB catridges available with LTO-6. Performance is also nearly doubled, with a native throughput of 315 MB/sec, or effective 780 MB/sec effective capacity with 2.5x compression. The LTO consortium, of which IBM is a founding member, has published the roadmap for LTO generations to LTO-8, LTO-9 and LTO-10.
IBM will offer both half-height and full-height LTO-7 tape drives. All the features you love from LTO-6 like WORM, partitioning and Encryption carry forward. These drives will be supported on a variety of distributed operating systems, including Linux on z System mainframes, and the IBM i platform on POWER Systems.
The Linear Tape File System (LTFS) can be used to treat LTO-7 cartridges in much the same way as Compact Discs or USB memory sticks, allowing one person to create conent on an LTO-7 tape cartridge, and pass that cartridge to the next employee, or to another company. LTFS is also the basis for IBM Spectrum Archive that allows tape data to be part of a global namespace with IBM Spectrum Scale.
LTO-7 will be supported on the TS2900 auto-loader, as well as all of IBM's tape libraries: TS3100, TS3200, TS3310, TS3500 and TS4500. You can connect up to 15 TS3500 tape libraries together with shuttle connectors, for a maximum capacity of 2,700 drives serving 300,000 cartridges, for a maximum capacity of 1.8 Exabytes of data in a single system environment.
In addition to LTO-7 support, the IBM TS4500 tape library was also enchanced. You can now grow it up to 18 frames, and have up to 128 drives serving 23,170 cartridges, for a maximum capacity of 139 PB of data. You can now also intermix LTO and 3592 frames in the same TS4500 tape library.
For comptability, LTO-7 drives can read existing LTO-5 and LTO-6 tape cartridges, and can write to LTO-6 media, to help clients with transition.
You may not be the right person to ask but I am asking everyone so "How do you see hybrid disk drives?"
(For the record, I am not immediately related to Robert. At onepoint, "Pearson" was the 12th most common surname in the USA, but now doesn't even make the Top 100.)
Robert, I would like to encourage you and everyone else to ask questions, don't worry if I am the wrong person to ask, asprobably I know the right person within IBM. Some people have called me the "Kevin Bacon" of Storage,as I am often less than six degrees away from the right person, having worked in IBM Storage for over 20 years.
For those not familiar with hybrid drives, there is a good write-up in Wikipedia.
Unfortunately, most of the people I would consult on this question, such as those from Market Intelligence or Research, are on vacation for the holidays, so, Robert, I will have to rely on my trusted 78-card Tarot deck and answer you with a five-card throw.
Your first card, Robert, is the Hermit. This card represents "introspection". The best I/O is no I/O, which means that if applications can keep the information they need inside server memory, you can avoid the bus bandwidth limitations to going to external storage devices. Where external storage makes sense is when data is shared between servers, or when the single server is limited to a set amount of internal memory. So, consider maxing out the memory in your server first (IBM would be glad to sell you more internal memory!!!), then consider outside solid-state or hybrid devices. Windows for example has an architectural limit of 4GB.
Your second card, Robert, is the Four of Cups, representing "apathy".On the card, you see three cups together, with the fourth cup being delivered from a cloud. This reminds me thatwe have three storage tiers already (memory,disk,tape), and introducing a fourth tier into the mix may not garnermuch excitement. For the mainframe, IBM introduced a Solid-State Device, call the Coupling Facility, which can be accessed from multipleSystem z servers. It is used heavily by DFSMS and DB2 to hold shared information. However, given some customer's apathytowards Information Lifecycle Management which includes "tiered storage", introducing yet another tier that forcespeople to decide what data goes where may be another challenge.
Your third card, Robert, is the Chariot, which represents "Speed, Determination,and Will". In some cases, solid state disk are faster for reading, but can be slower for writing. In the case of ahybrid drive, where the memory acts as a front-end cache, read-hits would be faster, but read-misses might be slower.While the idea of stopping the drives during inactivity will reduce power consumption, spinning up and slowing downthe disk may incur additional performance penalties. At the time of this post, the fastest disk system remains the IBM SAN Volume Controller, based on SPC-1 and SPC-2 benchmarks in excess of those published for other devices.
Your fourth card, Robert, is the Eight of Pentacles, which represents"Diligence, Hard work". The pentacles are coins with five-sided stars on them, and this often represents money.Our research team has projected that spinning disk will continue to be a viable and profitable storage media for at least anothereight years.
Your fifth and last card, Robert, is the World, which normallyrepresents "Accomplishment", but since it is turned upside down, the meaning is reversed to "Limitation". Some Hybriddisks, and some types of solid state memory in general, do have limitations in the number of write cycles they can handle. For thoseunhappy with the frequency and slowness for rebuilds on SATA disk may find similar problems with hybrid drives.For that reason, businesses may not trust using hybrid drives for their busiest, mission-critical applications, but certainlymight use it for archive data with lower write-cycle requirements.
The tarot cards are never wrong, but certainly interpretations of the cards can be.
Continuing my post-week coverage of the [Data Center 2010 conference], Wednesday evening we had six hospitality suites. These are fun informal get-togethers sponsored by various companies. I present them in the order that I attended them.
Intel - The Silver Lining
Intel called their suite "The Silver Lining". Magician Joel Bauer wowed the crowds with amazing tricks.
Intel handed out branded "Snuggies". I had to explain to this guy that he was wearing his backwards.
i/o - Wrestling with your Data Center?
New-comer "i/o" named their suite "Wrestling with your Data Center?" They invited attendees frustrated with their data centers to don inflated Sumo Wrestling suits.
APC by Schneider Electric - Margaritaville
This will be the last year for Margaritaville, a theme that APC has used now for several years at this conference.
Cisco - Fire and Ice
Cisco had "Fire and Ice" with half the room decorated in Red for fire, and White for ice.
This is Ivana, welcoming people to the "Ice" side.
This is Peter, on the "Fire" side. Cisco tried to have opposites on both sides, savory food on one side, sweets on the other.
CA Technologies - Can you Change the Game?
CA Technologies offered various "sports games", with a DJ named "Coach".
Compellent - Get "Refreshed" at the Fluid Data Hospitality Suite
Compellent chose a low-key format, "lights out" approach with a live guitarist. They had hourly raffles for prizes, but it was too dark to read the raffle ticket numbers.
Of the six, my favorite was Intel. The food was awesome, the Snuggies were hilarious, and the magician was incredibly good. I would like to think Intel for providing me super-secret inside access to their Cloud Computing training resources and for the Snuggie!
My how time flies. This week marks my 24th anniversary working here at IBM. This would have escaped me completely, had I not gotten an email reminding me that it was time to get a new laptop. IBM manages these on a four-year depreciation schedule, and I received my current laptop back in June 2006, on my 20th anniversary.
When I first started at IBM, I was a developer on DFHSM for the MVS operating system, now called DFSMShsm on the z/OS operating system. We all had 3270 [dumb terminals], large cathode ray tubes affectionately known as "green screens", and all of our files were stored centrally on the mainframe. When Personal Computers (PC) were first deployed, I was assigned the job of deciding who got them when. We were getting 120 machines, in five batches of 24 systems each, spaced out over the next two years. I was assigned the job of recommending who should get a PC during the first batch, the second batch, and so on. I was concerned that everyone would want to be part of the first batch, so I put out a survey, asking questions on how familiar they were with personal computers, whether they owned one at home, were familiar with DOS or OS/2, and so on.
It was actually my last question that helped make the decision process easy:
How soon do you want a Personal Computer to replace your existing 3270 terminal?
As late as possible
I had five options, and roughly 24 respondents checked each one, making my job extremely easy. Ironically, once the early adopters of the first batch discovered that these PC could be used for more than just 3270 terminal emulation, many of the others wanted theirs sooner.
Back then, IBM employees resented any form of change. Many took their new PC, configured it to be a full-screen 3270 emulation screen, and continued to work much as they had before. My mentor, Jerry Pence, would print out his mails, and file the printed emails into hanging file folders in his desk credenza. He did not trust saving them on the mainframe, so he was certainly not going to trust storing them on his new PC. One employee used his PC as a door stop, claiming he will continue to use his 3270 terminal until they take it away from him.
Moving forward to 2006, I was one of the first in my building to get a ThinkPad T60. It was so new that many of the accessories were not yet available. It had Windows XP on a single-core 32-bit processor, 1GB RAM, and a huge 80GB disk drive. The built-in 1GbE Ethernet went unused for a while, as we had 16 Mbps Token Ring network.
I was the marketing strategist for IBM System Storage back then, and needed all this excess power and capacity to handle all my graphic-intense applications, like GIMP and Second Life.
Over the past four years, I made a few slight improvements. I partitioned the hard drive to dual-boot between Windows and Linux, and created a separate partition for my data that could be accessed from either OS. I increased the memory to 2GB and replaced the disk with a drive holding 120GB capacity.
A few years ago, IBM surprised us by deciding to support Windows, Linux and Mac OS computers. But actually it made a lot of sense. IBM's world-renown global services manages the help-desk support of over 500 other companies in addition to the 400,000 employees within IBM, so they already had to know how to handle these other operating systems. Now we can choose whichever we feel makes us more productive. Happy employees are more productive, of course. IBM's vision is that almost everything you need to do would be supported on all three OS platforms:
Access your email, calendar, to-do list and corporate databases via Lotus Notes on either Windows, Linux or Mac OS. Corporate databases store our confidential data centrally, so we don't have to have them on our local systems. We can make local replicas of specific databases for offline access, and these are encrypted on our local hard drive for added protection. Emails can link directly to specific entries in a database, so we don't have huge attachments slowing down email traffic. IBM also offers LotusLive, a public cloud offering for companies to get out of managing their own email Lotus Domino repositories.
Create presentations, documents and spreadsheets on either Windows, Linux or Mac OS. Lotus Symphony is based on open source OpenOffice and is compatible with Microsoft Office. This allows us to open and update directly in Microsoft's PPT, DOC and XLS formats.
Many of the corporate applications have now been converted to be browser-accessible. The Firefox browser is available on Windows, Linux and Mac OS. This is a huge step forward, in my opinion, as we often had to download applications just to do the simplest things like submit our time-sheet or travel expense reimbursement. I manage my blog, Facebook and Twitter all from online web-based applications.
The irony here is that the world is switching back to thin clients, with data stored centrally. The popularity of Web 2.0 helped this along. People are using Google Docs or Microsoft OfficeOnline to eliminate having to store anything locally on their machines. This vision positions IBM employees well for emerging cloud-based offerings.
Sadly, we are not quite completely off Windows. Some of our Lotus Notes databases use Windows-only APIs to access our Siebel databases. I have encountered PowerPoint presentations and Excel spreadsheets that just don't render correctly in Lotus Symphony. And finally, some of our web-based applications work only in Internet Explorer! We use the outdated IE6 corporate-wide, which is enough reason to switch over to Firefox, Chrome or Opera browsers. I have to put special tags on my blog posts to suppress YouTube and other embedded objects that aren't supported on IE6.
So, this leaves me with two options: Get a Mac and run Windows on the side as a guest operating system, or get a ThinkPad to run Windows or Windows/Linux. I've opted for the latter, and put in my order for a ThinkPad 410 with a dual-core 64-bit i5 Intel processor, VT-capable to provide hardware-assistance for virtualization, 4GB of RAM, and a huge 320GB drive. It will come installed with Windows XP as one big C: drive, so it will be up to me to re-partition it into a Windows/Linux dual-boot and/or Windows and Linux running as guest OS machine.
(Full disclosure to make the FTC happy: This is not an endorsement for Microsoft or against Apple products. I have an Apple Mac Mini at home, as well as Windows and Linux machines. IBM and Apple have a business relationship, and IBM manufactures technology inside some of Apple's products. I own shares of Apple stock, I have friends and family that work for Microsoft that occasionally send me Microsoft-logo items, and I work for IBM.)
I have until the end of June to receive my new laptop, re-partition, re-install all my programs, reconfigure all my settings, and transfer over my data so that I can send my old ThinkPad T60 back. IBM will probably refurbish it and send it off to a deserving child in Africa.
If you have an old PC or laptop, please consider donating it to a child, school or charity in your area. To help out a deserving child in Africa or elsewhere, consider contributing to the [One Laptop Per Child] organization.
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.