This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this community and its apps will no longer be available. More details available on our FAQ.
My series last week on IBM Watson (which you can read [here], [here], [here], and [here]) brought attention to IBM's Scale-Out Network Attached Storage [SONAS]. IBM Watson used a customized version of SONAS technology for its internal storage, and like most of the components of IBM Watson, IBM SONAS is commercially available as a stand-alone product.
Like many IBM products, SONAS has gone through various name changes. First introduced by Linda Sanford at an IBM SHARE conference in 2000 under the IBM Research codename Storage Tank, it was then delivered as a software-only offering SAN File System, then as a services offering Scale-out File Services (SoFS), and now as an integrated system appliance, SONAS, in IBM's Cloud Services and Systems portfolio.
If you are not familiar with SONAS, here are a few of my previous posts that go into more detail:
This week, IBM announces that SONAS has set a world record benchmark for performance, [a whopping 403,326 IOPS for a single file system]. The results are based on comparisons of publicly available information from Standard Performance Evaluation Corporation [SPEC], a prominent performance standardization organization with more than 60 member companies. SPEC publishes hundreds of different performance results each quarter covering a wide range of system performance disciplines (CPU, memory, power, and many more). SPECsfs2008_nfs.v3 is the industry-standard benchmark for NAS systems using the NFS protocol.
(Disclaimer: Your mileage may vary. As with any performance benchmark, the SPECsfs benchmark does not replicate any single workload or particular application. Rather, it encapsulates scores of typical activities on a NAS storage system. SPECsfs is based on a compilation of workload data submitted to the SPEC organization, aggregated from tens of thousands of fileservers, using a wide variety of environments and applications. As a result, it is comprised of typical workloads and with typical proportions of data and metadata use as seen in real production environments.)
The configuration tested involves SONAS Release 1.2 on 10 Interface Nodes and 8 Storage Pods, resulting a single file system over 900TB usable capacity.
10 Interface Nodes; each with:
Maximum 144 GB of memory
One active 10GbE port
8 Storage Pods; each with:
2 Storage nodes and 240 drives
Drive type: 15K RPM SAS hard drives
Data Protection using RAID-5 (8+P) ranks
Six spare drives per Storage Pod
IBM wanted a realistic "no compromises" configuration to be tested, by choosing:
Regular 15K RPM SAS drives, rather than a silly configuration full of super-expensive Solid State Drives (SSD) to plump up the results.
Moderate size, typical of what clients are asking for today. The Goldilocks rule applies. This SONAS is not a small configuration under 100TB, and nowhere close to the maximum supported configuration of 7,200 disks across 30 Interface Nodes and 30 Storage Pods.
Single file system, often referred to as a global name space, rather than using an aggregate of smaller file systems added together that would be more complicated to manage. Having multiple file systems often requires changes to applications to take advantage of the aggregate peformance. It is also more difficult to load-balance your performance and capacity across multiple file systems. Of course, SONAS can support up to 256 separate file systems if you have a business need for this complexity.
The results are stunning. IBM SONAS handled three times more workload for a single file system than the next leading contender. All of the major players are there as well, including NetApp, EMC and HP.
"When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte (TB). For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers."
I had several readers ask me to explain the significance of the "Terabyte". I'll work my way up.
A bit is simply a zero (0) or one (1). This could answer a Yes/No or True/False question.
Most computers have standardized a byte as a collection of 8 bits. There are 256 unique combinations of ones and zeros possible, so a byte could be used to storage a 2-digit integer, or a single upper or lower case character in the English alphabet. In pratical terms, a byte could store your age in years, or your middle initial.
The Kilobyte is a thousand bytes, enough to hold a few paragraphs of text. A typical written page could be held in 4 KB, for example.
The IBM Challenge to play on Jeopardy! is being compared to the historic 1969 moon landing. To land on the moon, Apollo 11 had the "Apollo Guidance Computer" (AGC) which had 74KB of fixed read-only memory, and 2KB of re-writeable memory. Over [3500 IBM employees were involved] to get the astronauts to the moon and safely back to earth again.
The importance of this computer was highlighted in a [lecture by astronaut David Scott] who said: "If you have a basketball and a baseball 14 feet apart, where the baseball represents the moon and the basketball represents the Earth, and you take a piece of paper sideways, the thinness of the paper would be the corridor you have to hit when you come back."
The Megabyte is a thousand KB, or a million bytes. The 3.5-inch floppy diskette, mentioned in my post [A Boxfull of Floppies] could hold 1.44MB, or about 360 pages of text.
In the article [Wikipedia as a printed book], the printing of a select 400 articles resulted in a book 29 inches thick. Those 5,000 pages would consume about 20 MB of space.
One of my favorite resources I use to search is the Internet Movie Data Base [IMDB]. Leaving out the photos and videos, the [text-only portion of the IMDB database is just over 600 MB], representing nearly all of the actors, awards, nominations, television shows and movies. A standard CD-ROM can hold 700MB, so the text portion of the IMDB could easily fit on a single CD.
The Gigabyte is a thousand MB, or a billion bytes. My Thinkpad T410 laptop has 4GB of RAM and 320GB of hard disk space. My laptop comes with a DVD burner, and each DVD can hold up to 4.7GB of information.
The popular Wikipedia now has some 17 million articles, of which 3.5 million are in English language. It would only take [14GB of space to hold the entire English portion] of Wikipedia. That is small enough to fit on twenty CDs, three DVDs, an Apple iPad or my cellphone (a Samsung Galaxy S Vibrant).
Perhaps you are thinking, "Someone should offer Wikipedia pre-installed on a small handheld!" Too late. The [The Humane Reader] is able to offer 5,000 books and Wikipedia in a small device that connects to your television. This would be great for people who do not have access to the internet, or for parents who want their kids to do their homework, but not be online while they are doing it.
In the latest 2009 report of [How Much Information?] from the University of California, San Diego, the average American consumes 34 GB of information. This includes all the information from radio, television, newspapers, magazines, books and the internet that a person might look at or listen to throughout the day. This project is sponsored by IBM and others to help people understand the nature of our information-consuption habits.
Back in 1992, I visited a client in Germany. Their 90 GB of disk storage attached to their mainframe was the size of three refrigerators, and took five full-time storage administrators to manage.
The Terabyte is a thousand GB, or a trillion bytes. It is now possible to buy external USB drive for your laptop or personal computer that holds 1TB or more. However, at 40MB/sec speeds that USB 2.0 is capable of, it would take seven hours to do a bulk transfer in or out of the device.
IBM offers 1TB and 2TB disk drives in many of our disk systems. In 2008, IBM was preparing to announce the first 1TB tape drive. However, Sun Microsystems announced their own 1TB drive the day before our big announcement, so IBM had to rephrase the TS1130 announcement to [The World's Fastest 1TB tape drive!]
A typical academic research library will hold about 2TB of information. For the [US Library of Congress] print collection is considered to be about 10TB, and their web capture team has collected 160TB of digital data. If you are ever in the Washington DC, I strongly recommend a visit to the Library of Congress. It is truly stunning!
Full-length computer animated movies, like [Happy Feet], consume about 100TB of disk storage during production. IBM offers disk systems that can hold this much data. For example, the IBM XIV can hold up to 151 TB of usable disk space in the size of one refrigerator.
A Key Performance Indicator (KPI) for some larger companies is the number of TB that can be managed by a full-time employee, referred to as TB/FTE. Discussions about TB/FTE are available from IT analysts including [Forrester Research] and [The Info Pro].
The website [Ancestry.com] claims to have over 540 million names in its genealogical database, with a storage of 600TB, with the inclusion of [US census data from 1790 to 1930]. The US government took nine years to process the 1880 census, so for the 1890 census, it rented equipment from Herman Hollerith's Tabulating Machine Company. This company would later merge with two others in 1911 to form what is now called IBM.
A Petabyte is thousand TB, or a quadrillion bytes. It is estimated that all printed materials on Earth would represent approximately 200 PB of information.
IBM's largest disk system, the Scale-Out Network Attach Storage (SONAS) comprised of up to 7,200 disk drives, which can hold over 11 PB of information. A smaller 10-frame model, the same size as IBM Watson, with six interface nodes and 19 storage pods, could hold over 7 PB of information.
For those of us in the IT industry, 1TB is small potatoes. I for one, was expecting it to be much bigger. But for everyone else, the equivalent of 200 million pages of text that IBM Watson has loaded inside is an incredibly large repository of information. I suspect IBM Watson probably contains the complete works of Shakespeare as well as other fiction writers, the IMDB database, all 3.5 million articles of Wikipedia, religious texts like the Bible and the Quran, famous documents like the Magna Carta and the US Constitution, and reference books like a Dictionary, a Thesaurus, and "Gray's Anatomy". And, of course, lots and lots of lists.
For those on Twitter, follow [@ibmwatson] these next three days during the challenge.
The weather has warmed up here in Tucson so I started my Spring Cleaning early this year and unearthed from my garage a [Bankers Box] full of floppy diskettes.
IBM invented the floppy disk back in 1971, and continued to make improvements and enhancements through the 1980s and 1990s. It will be one of the many inventions celebrated as part of IBM's Centennial (100-year) anniversary. Here is an example [T-shirt]
IBM needed a way to send out small updates and patches for microcode of devices out in client locations. IBM had drives that could write information, and sent out "read-only" drives to the customer locations to receive these updates. These were flexible plastic circles with a magnetic coating, and placed inside a square paper sleeve. Imagine a floppy disk the size of a piece of standard paper. The 8-inch floppy fit conveniently in a manila envelope, sendable by standard mail, and could hold nearly 80KB of data.
I've been using floppies for the past thirty years. Here's some of my fondest memories:
While still in high school, my friend Franz Kurath and I formed "Pearson Kurath Systems", a software development firm. We wrote computer programs to run on UNIX and Personal Computers for small businesses here in Tucson. Whenever we developed a clever piece of code, a subroutine or procedure, we would save it on a floppy disk and re-use it for our next project. We wrote in the BASIC language, and our databases were simple Comma-Separated-Variable (CSV) flat files.
The 5.25-inch floppies we used could hold 360KB, and were flexible like the 8-inch models. Later versions of these 5.25-inch floppies would be able to hold as much as 1.2MB of data. We would convert single-sided floppies into double-sided ones by cutting out a notch in the outer sleeve. Covering up the notches would mark them as read-only.
The 3.5-inch floppies were introduced with a hard plastic shell, with the selling point that you can slap on a mailing label and postage and send it "as is" without the need for a separate envelope. These new 3.5-inch floppies would carry "HD" for high density 720KB, and double-sided versions could hold 1.44MB of data. The term "diskette" was used to associate these new floppies with [hard-shelled tape cassettes]. Sliding a plastic tab would allow floppies to be marked "read-only". IBM has the patent on this clever invention.
Continuing our computer programming business in college, Franz and I took out a bank loan to buy our first Personal Computer, for over $5000 dollars USD. Until then, we had to use equipment belonging to each client. The banks we went to didn't understand why we needed a computer, and suggested we just track our expenses on traditional green-and-white ledger paper. Back then, peronsal computers were for balancing your checkbook, playing games and organizing your collection of cooking recipies. But for us, it was a production machine. A computer with both 5.25-inch and 3.5-inch drives could copy files from one format to another as needed. The boost in productivity paid for itself within months.
Apple launched its Macintosh computer in 1984, with a built-in 3.5-inch disk drive as standard equipment. Here is a YouTube video of an [astronaut ejecting a floppy disk] from an Apple computer in space.
In my senior year at the University of Arizona, my roommate Dave had borrowed my backpack to hold his lunch for a bike ride. He thought he had taken everything out, but forgot to remove my 3.5-inch floppy diskette containing files for my senior project. By the time he got back, the diskette was covered in banana pulp. I was able to rescue my data by cracking open the plastic outer shell, cleaning the flexible magnetic media in soapy water, placing it back into the plastic shell of a second diskette, and then copied the data off to a third diskette.
After graduating from college, Franz and I went our separate ways. I went to work for IBM, and Franz went to work for [Chiat/Day], the advertising agency famous for the 1984 Macintosh commercial. We still keep in touch through Facebook.
At IBM, I was given a 3270 terminal to do my job, and would not be assigned a personal computer until years later. Once I had a personal computer at home and at work, the floppy diskette became my "briefcase". I could download a file or document at work, take it home, work on it til the wee hours of the morning, and then come back the next morning with the updated effort.
To help prepare me for client visits and public speaking at conferences, IBM loaned me out to local schools to teach. This included teaching Computer Science 101 at Pima Community College. When asked by a student whether to use "disc" or "disk", I wrote a big letter "C" on the left side of the chalkboard, and a big letter "K" on the right side. If it is round, I told the students while pointing at the letter "C", like a CD-ROM or DVD, use "disc". If it has corners, pointing to corners of the letter "K", like a floppy diskette or hard disk drive, use "disk".
On one of my business trips to visit a client, we discovered the client had experienced a problem that we had just recently fixed. Normally, this would have meant cutting a Program Trouble Fix (PTF) to a 3480 tape cartridge at an IBM facility, and send it to the client by mail. Unwilling to wait, I offered to download the PTF onto a floppy diskette on my laptop, upload it from a PC connected to their systems, and apply it there. This involved a bit of REXX programming to deal with the differences between ASCII and EBCDIC character sets, but it worked, and a few hours later they were able to confirm the fix worked.
In 1998, Apple would signal the begining of the end of the floppy disk era, announcing their latest "iMac" would not come with an internal built-in floppy drive. David Adams has a great article on this titled [The iMac and the Floppy Drive: A Conspiracy Theory]. You can get external floppy drives that connect via USB, so not having an internal drive is no longer a big deal.
While teaching a Top Gun class to a mix of software and hardware sales reps, one of the students asked what a "U" was. He had noticed "2U" and "3U" next to various products and wondered what that was referring to. The "U" represents the [standard unit of measure for height of IT equipment in standard racks]. To help them visualize, I explained that a 5.25-inch floppy disk was "3U" in size, and a 3.5-inch floppy diskette was "2U". Thus, a "U" is 1.75 inches, the thinnest dimension on a two-by-four piece of lumber. Servers that were only 1U tall would be referred to as "pizza boxes" for having similar dimensions.
Every year, right around November or so, my friends and family bring me their old computers for me to wipe clean. Either I would re-load them with the latest Ubuntu Linux so that their kids could use it for homework, or I would donate it to charity. Last November, I got a computer that could not boot from a CD-ROM, forcing me to build a bootable floppy. This gave me a chance to check out the various 1-disk and 2-disk versions of Linux and other rescue disks. I also have a 3-disk set of floppies for booting OS/2 in command line mode.
So while this unexpected box of nostalgia derailed my efforts to clean out my garage this weekend, it did inspire me to try to get some of the old files off them and onto my PC hard drive. I have already retrieved some low-res photographs, some emails I sent out, and trip reports I wrote. While floppy diskettes were notorious for being unreliable, and this box of floppies has been in the heat and cold for many Arizonan summers and winters, I am amazed that I was able to read the data off most of them so far, all the way back to data written in 1989. While the data is readable, in most cases I can't render it into useful information. This brings up a few valuable lessons:
Backups are not Archives
Some of the files are in proprietary formats, such as my backups for TurboTax software. I would need a PC running a correct level of Windows operating system, and that particular software, just to restore the data. TurboTax shipped new software every year, and I don't know how forward or backward-compatible each new release was.
Another set of floppies are labeled as being in "FDBACK" format. I have no idea what these are. Each floppy has just two files, "backup.001" and "control.001", for example.
Backups are intended solely to protect against unexpected loss from broken hardware or corrupted data. If you plan to keep data as archives for long-term retention, use archive formats that will last a long time, so that you can make sense of them later.
Operating System Compatibility
Windows 7 and all of my favorite flavors of Linux are able to recognize the standard "FAT" file system that nearly all of my floppies are written in. Sadly, I have some files that were compressed under OS/2 operating system using software called "Stacker". I may have to stand up an OS/2 machine just to check out what is actually on those floppies.
You can't judge a book by its cover
Floppies were a convenient form of data interchange. Sometimes, I reused commercially-labeled floppies to hold personal files. So, just because a floppy says "America On-Line (AOL) version 2.5 Installation", I can't just toss it away. It might actually contain something else entirely. This means I need to mount each floppy to check on its actual contents.
So what will I do with the floppies I can't read, can't write, and can't format? I think I will convert them into a [retro set of coasters], to protect my new living room furniture from hot and cold beverages.
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.
A client asked me to explain "Nearline storage" to them. This was easy, I thought, as I started my IBM career on DFHSM, now known as DFSMShsm for z/OS, which was created in 1977 to support the IBM 3850 Mass Storage System (MSS), a virtual storage system that blended disk drives and tape cartridges with robotic automation. Here is a quick recap:
Online storage is immediately available for I/O. This includes DRAM memory, solid-state drives (SSD), and always-on spinning disk, regardless of rotational speed.
Nearline storage is not immediately available, but can be made online quickly without human intervention. This includes optical jukeboxes, automated tape libraries, as well as spin-down massive array of idle disk (MAID) technologies.
Offline storage is not immediately available, and requires some human intervention to bring online. This can include USB memory sticks, CD/DVD optical media, shelf-resident tape cartridges, or other removable media.
Sadly, it appears a few storage manufacturers and vendors have been misusing the term "Nearline" to refer to "slower online" spinning disk drives. I find this [June 2005 technology paper from Seagate], and this [2002 NetApp Press Release], the latter of which included this contradiction for their "NearStore" disk array. Here is the excerpt:
"Providing online access to reference information—NetApp nearline storage solutions quickly retrieve and replicate reference and archive information maintained on cost-effective storage—medical images, financial models, energy exploration charts and graphs, and other data-intensive records can be stored economically and accessed in multiple locations more quickly than ever"
Which is it, "online access" or "nearline storage"?
If a client asked why slower drives consume less energy or generate less heat, I could explain that, but if they ask why slower drives must have SATA connections, that is a different discussion. The speed of a drive and its connection technology are for the most part independent. A 10K RPM drive can be made with FC, SAS or SATA connection.
I am opposed to using "Nearlne" just to distinguish between four-digit speeds (such as 5400 or 7200 RPM) versus "online" for five-digit speeds (10,000 and 15,000 RPM). The difference in performance between 10K RPM and 7200 RPM spinning disks is miniscule compared to the differences between solid-state drives and any spinning disk, or the difference between spinning disk and tape.
I am also opposed to using the term "Nearline" for online storage systems just because they are targeted for the typical use cases like backup, archive or other reference information that were previously directed to nearline devices like automated tape libraries.
Can we all just agree to refer to drives as "fast" or "slow", or give them RPM rotational speed designations, rather than try to incorrectly imply that FC and SAS drives are always fast, and SATA drives are always slow? Certainly we don't need new terms like "NL-SAS" just to represent a slower SAS connected drive.