Wrapping up my coverage of the annual [2010 System Storage Technical University], I attended what might be perhaps the best session of the conference. Jim Nolting, IBM Semiconductor Manufacturing Engineer, presented the new IBM zEnterprise mainframe, "A New Dimension in Computing", under the Federal track.
The zEnterprises debunks the "one processor fits all" myth. For some I/O-intensive workloads, the mainframe continues to be the most cost-effective platform. However, there are other workloads where a memory-rich Intel or AMD x86 instance might be the best fit, and yet other workloads where the high number of parallel threads of reduced instruction set computing [RISC] such as IBM's POWER7 processor is more cost-effective. The IBM zEnterprise combines all three processor types into a single system, so that you can now run each workload on the processor that is optimized for that workload.
- IBM zEnterprise z196 Central Processing Complex (CPC)
Let's start with the new mainframe z196 central processing complex (CPC). Many thought this would be called the z11, but that didn't happen. Basically, the z196 machine has a maximum 96 cores versus z10's 64 core maximum, and each core runs 5.2GHz instead of z10's cores running at 4.7GHz. It is available in air-cooled and water-cooled models. The primary operating system that runs on this is called "z/OS", which when used with its integrated UNIX System Services subsystem, is fully UNIX-certified. The z196 server can also run z/VM, z/VSE, z/TPF and Linux on z, which is just Linux recompiled for the z/Architecture chip set. In my June 2008 post [Yes, Jon, there is a mainframe that can help replace 1500 servers], I mentioned the z10 mainframe had a top speed of nearly 30,000 MIPS (Million Instructions per Second). The new z196 machine can do 50,000 MIPS, a 60 percent increase!
(Update: Back in 2007, IBM and Sun mutually supported [OpenSolaris on an IBM System z mainframe]. Unfortunately, after Oracle acquired Sun, the OpenSolaris Governing Board has [grown uneasy over Oracle's silence] about the future of OpenSolaris on any platform. The OpenSolaris [download site] identifies 2009.06 as the latest release, but only for x86 and SPARC chip sets. Apparently, the 2010.03 release expected five months ago in March has slipped. Now it looks official that [OpenSolaris is Dead].)
The z196 runs a hypervisor called PR/SM that allows the box to be divided into dozens of logical partitions (LPAR), and the z/VM operating system can also act as a hypervisor running hundreds or thousands of guest OS images. Each core can be assigned a specialty engine "personality": GP for general processor, IFL for z/VM and Linux, zAAP for Java and XML processing, and zIIP for database, communications and remote disk mirroring. Like the z9 and z10, the z196 can attach to external disk and tape storage via ESCON, FICON or FCP protocols, and through NFS via 1GbE and 10GbE Ethernet.
- IBM zEnterprise BladeCenter Extension (zBX)
There is a new frame called the zBX that basically holds two IBM BladeCenter chassis, each capable of 14 blades, so total of 28 blades per zBX frame. For now, only select blade servers are supported inside, but IBM plans to expand this to include more as testing continues. The POWER-based blades can run native AIX, IBM's other UNIX operating system, and the x86-based blades can run Linux-x86 workloads, for example. Each of these blade servers can run a single OS natively, or run a hypervisor to have multiple guest OS images. IBM plans to look into running other POWER and x86-based operating systems in the future.
If you are already familiar with IBM's BladeCenter, then you can skip this paragraph. Basically, you have a chassis that holds 14 blades connected to a "mid-plane". On the back of the chassis, you have hot-swappable modules that snap into the other side of the mid-plane. There are modules for FCP, FCoE and Ethernet connectivity, which allows blades to talk to each other, as well as external storage. BladeCenter Management modules serve as both the service processor as well as the keyboard, video and mouse Local Console Manager (LCM). All of the IBM storage options available to IBM BladeCenter apply to zBX as well.
Besides general purpose blades, IBM will offer "accelerator" blades that will offload work from the z196. For example, let's say an OLAP-style query is issued via SQL to DB2 on z/OS. In the process of parsing the complicated query, it creates a Materialized Query Table (MQT) to temporarily hold some data. This MQT contains just the columnar data required, which can then be transferred to a set of blade servers known as the Smart Analytics Optimizer (SAO), then processes the request and sends the results back. The Smart Analytics Optimizer comes in various sizes, from small (7 blades) to extra large (56 blades, 28 in each of two zBX frames). A 14-blade configuration can hold about 1TB of compressed DB2 data in memory for processing.
- IBM zEnterprise Unified Resource Manager
You can have up to eight z196 machines and up to four zBX frames connected together into a monstrously large system. There are two internal networks. The Inter-ensemble data network (IEDN) is a 10GbE that connects all the OS images together, and can be further subdivided into separate virtual LANs (VLAN). The Inter-node management network (INMN) is a 1000 Mbps Base-T Ethernet that connects all the host servers together to be managed under a single pane of glass known as the Unified Resource Manager. It is based on IBM Systems Director.
By integrating service management, the Unified Resource Manager can handle Operations, Energy Management, Hypervisor Management, Virtual Server Lifecycle Management, Platform Performance Management, and Network Management, all from one place.
- IBM Rational Developer for System z Unit Test (RDz)
But what about developers and testers, such as those Independent Software Vendors (ISV) that produce mainframe software. How can IBM make their lives easier?
Phil Smith on z/Journal provides a history of [IBM Mainframe Emulation]. Back in 2007, three emulation options were in use in various shops:
- Open Mainframe, from Platform Solutions, Inc. (PSI)
- FLEX-ES, from Fundamental Software, Inc.
- Hercules, which is an open source package
None of these are viable options today. Nobody wanted to pay IBM for its Intellectual Property on the z/Architecture or license the use of the z/OS operating system. To fill the void, IBM put out an officially-supported emulation environment called IBM System z Professional Development Tool (zPDT) available to IBM employees, IBM Business Partners and ISVs that register through IBM Partnerworld. To help out developers and testers who work at clients that run mainframes, IBM now offers IBM Rational Developer for System z Unit Test, which is a modified version of zPDT that can run on a x86-based laptop or shared IBM System x server. Based on the open source [Eclipse IDE], the RDz emulates GP, IFL, zAAP and zIIP engines on a Linux-x86 base. A four-core x86 server can emulate a 3-engine mainframe.
With RDz, a developer can write code, compile and unit test all without consuming any mainframe MIPS. The interface is similar to Rational Application Developer (RAD), and so similar skills, tools and interfaces used to write Java, C/C++ and Fortran code can also be used for JCL, CICS, IMS, COBOL and PL/I on the mainframe. An IBM study ["Benchmarking IDE Efficiency"] found that developers using RDz were 30 percent more productive than using native z/OS ISPF. (I mention the use of RAD in my post [Three Things to do on the IBM Cloud]).
What does this all mean for the IT industry? First, the zEnterprise is perfectly positioned for [three-tier architecture] applications. A typical example could be a client-facing web-server on x86, talking to business logic running on POWER7, which in turn talks to database on z/OS in the z196 mainframe. Second, the zEnterprise is well-positioned for government agencies looking to modernize their operations and significantly reduce costs, corporations looking to consolidate data centers, and service providers looking to deploy public cloud offerings. Third, IBM storage is a great fit for the zEnterprise, with the IBM DS8000 series, XIV, SONAS and Information Archive accessible from both z196 and zBX servers.
To learn more, see the [12-page brochure] or review the collection of [IBM Redbooks]. Check out the [IBM Conferences schedule] for an event near you. Next year, the IBM Storage University will be held July 18-22, 2011 in Orlando, Flordia.
technorati tags: IBM, Technical University, zEnterprise, x86, POWER7, RISC, z/OS, Linux, AIX, OpenSolaris, Oracle, FICON, NFS, z196, zBX, DB2, SAO, IEDN, INMN, RDz, ISV, Eclipse, Cloud Computing
It's official! My "blook" Inside System Storage - Volume I
is now available.
|This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.|
- IT storage and storage networking concepts
- IBM strategy, hardware, software and services
- Disk systems, Tape systems, and storage networking
- Storage and infrastructure management software
- Second Life, Facebook, and other Web 2.0 platforms
- IBM’s many alliances, partners and competitors
- How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
- Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
- Managers looking to reward early-tenure employees and retain the best talent
- IT specialists and technicians wanting a marketing perspective of the storage industry
- Mentors interested in providing motivation and encouragement to their proteges
- Educators looking to provide books for their classroom or library collection
- Authors looking to write a blook themselves, to see how to format and structure a finished product
- Marketing personnel that want to better understand Web 2.0, Second Life and social networking
- Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
- College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
technorati tags: IBM, blook, Volume I, Jennifer Jones, system, storage, strategy, hardware, software, services, disk, tape, networking, SAN, secondlife, Web2.0, facebook, Lulu, publishing, Blooker Prize, articles, magazines, proceedings, Ted Rall, insights, glossary, early-tenure, mentors, library, classroom, administrator, print, publish, on demand
Yesterday, I promised I would cover other products from the Feb 12 announcement. Today I will focus on the IBM SAN768B director. Some people are confused on the differences between switchesand directors. I find there are three key differences:
- Directors are designed to be 24x7 operation, highly available with no single points of failure or repair. Generally, all components in directors are redundant and hot-swappable, including Control Processors. In switches, some components are redundant and hot-swappable, such as fans and power supplies), but not the “motherboard” or controller. Often you have to take down a switch to make firmware or major hardware changes or upgrades.
- Directors are designed to take in "blades" with different features, port counts, or protocol capabilities. You can add or remove blades while the system is up and running. Switches have a fixed number of ports. (A Small Form-factor Pluggable optical transceiver [SFP] is the component that turns electric pulses into light pulses (and visa versa). You plug the SFP into the switch, and then the fiber optic cable is plugged into the SFP).
With switches, you often start with a base number of active ports, and then can enable the rest of the ports as you need them.
- Directors have hundreds of ports. Switches tend to have 64 ports or less.
Last year, Brocade acquired McDATA. Both were OEMs for IBM, and IBM distinguished that in the naming convention. The IBM SAN***B name was used to denote products manufactured for IBM by Brocade, and a SAN***M name was used to denote products manufactured by McDATA.
At that time, Brocade and McDATA equipment did not mix very well on the same fabric, so IBM retained the naming convention so that you as a customer knew what it worked with.
Brocade now has released with new levels of both operating systems--Brocade's FOS and McDATA's EOS--and their respective fabric managers--Brocade Fabric Manager (FM) and McDATA's Enterprise Fabric Connectivity Manager (EFCM)--so that they have full interoperability.
Brocade's goal is to enhance EFCM to be a common software management platform for all of their products going forward.
IBM used the maximum port count in the name to provide some clue as to the size of the switch or director. The SAN16B-2 or the SAN32B-3 are switches that have a maximum of 16 and 32 ports. The SAN256B supports a maximumeight blades of your choosing.Two different types were supported for FC ports, a 16-port blade and a 32-port blade.If all eight were 32-port blades then the maximum was 256 ports, hence the name. But then Brocade began offering 48-port blades. Should IBM change the name? No, it decided to leave itthe SAN256B even though it can now have a maximum of 384 ports.
Not to confuse anyone, the SAN768B also has a maximum of 384 ports, in the same 14U dimensions, but with a special twist. Normally to connect two directors together you use up ports from each, in what are called "inter-switch links" (ISL).These are ports you are taking away from availability from the servers and storage controllers. The SAN768Boffers a new alternative called "inter-chassis links". Each SAN768B has two processing blades, and each has two ICL ports, so with just four two-meter (2m) cables, you get the equivalent of 128 FC 8 Gbps ISL links without using 128 individual ports on each side. That is like giving you 256 ports back for use with servers and storage!
Since IBM directors require 240 volt power, IBM TotalStorage SAN Cabinet C36 include power distribution units (PDUs). PDUs are just glorified power strips, but a new intelligent PDU (iPDU) option introduces additional intelligence to monitor energy consumption for customers looking to measure, and perhaps charge back, energy consumption to the rest of the business. You can stack two SAN768B in one cabinet, one on top of the other, and connected via ICLs, it wouldlook like one huge 768-port backbone.
As a backbone for your data center, the SAN768B is positioned for two emerging technologies:
- 8 Gbps Fibre Channel (FC)
The SAN768B is powerful enough to have 32-port blades run full speed on all ports off-blade without oversubscription. Oversubscription is an emotional topic.
Normally, blades (like switches) can handle all traffic at full speed without delays provided the in-bound and out-bound ports involved are all on the same blade. In a director, however, if you need to communicate from a port on one blade to a port on a different blade, it is possible that off-blade traffic might be constrained or delayed in its transit across the backplane.
On the SAN768B, both the 16-port and 32-port blades can run at full 8 Gbps speed, and the 48-port is exposed to oversubscription only if you have more than 32-ports running at full 8 Gbps transferring data off-blade concurrently.
The new 8 Gbps SFPs support auto-negotiation at N-1 and N-2 generation link speeds. This means that they will automatically slow down when communicating with 4Gpbs and 2 Gbps devices, but they cannot communicate with 1 Gbps devices. If you are still using 1 Gbps devices in your data center, you will need to use 4 Gbps SFPs (which also support 2 Gbps and 1 Gbps link speeds) to communicate with those older devices.
- Fibre Channel over Ethernet (FCoE)
Wikipedia has a good summary of [FCoE].
Basically, this new technology enables transport of Fibre Channel packets over 10 Gbps Ethernet links. This 10 Gbps Ethernet can also be used to carry traditional iSCSI and TCP/IP traffic. FCoE introduces new extensions to provide Fibre Channel characteristics, like being lossless, and offering consistent performance. The ANSI T11 team is driving FCoE as an open standard, and at the moment it is not fully baked. I suggest you don't buy any FCoE equipment prematurely, as pre-standard devices or host bus adapters could get you burned later when the standard is finalized.
The idea is that FCoE blades can be installed in a SAN768B along with traditional FC blades, allowing routing of traffic between traditional FC and new FCoE ports. Those who have invested in FCIP for long distance replication will be able to continue using either FC or FCoE inputs.
One of the big drivers of FCoE is IBM BladeCenter. Currently, most BladeCenter blades support both Ethernet and FC connectivity and are connected to both Ethernet and FC switches on the back of each BladeCenter chassis. With FCoE, we have the potential to run both FC and IP traffic across simpler all-Ethernet blades, connecting through all-Ethernet switches on the backs of each chassis.
For more information on the IBM SAN768B, see the [IBM Press Release]. For more detailson Brocade's strategy, here is an 8-page white paper on their[Data Center Fabric] vision.
technorati tags: IBM, SAN768B, SAN, switch, director, backbone, SFP, Brocade, McDATA, BOS, EOS, BFM, EFCM, blade, ISL, ICL, FC, FCP, FCIP, FCoE, BladeCenter, Ethernet, 8Gbps, 10GbE, Data Center Fabric
Continuing my rant from Monday's post [Time for a New Laptop], I got my new laptop Wednesday afternoon. I was hoping the transition would be quick, but that was not the case. Here were my initial steps prior to connecting my two laptops together for the big file transfer:
- Document what my old workstation has
Back in 2007, I wrote a blog post on how to [Separate Programs from Data]. I have since added a Linux partition for dual-boot on my ThinkPad T60.
|/dev/sda1||26GB||NTFS||C:||Windows XP SP3 operating system and programs|
|/dev/sda2||12GB||ext3||/(root)||Red Hat Enterprise Linux 5.4|
|/dev/sda6||80GB||NTFS||D:||My Documents and other data|
I also created a spreadsheet of all my tools, utilities and applications. I combined and deduplicated the list from the following sources:
- Control Panel -> Add/Remove programs
- C:\Program Files
- Start -> Programs panels
- Program taskbar at bottom of screen
The last one was critical. Over the years, I have gotten in the habit of saving those ZIP or EXE files that self-install programs into a separate directory, D:/Install-Files, so that if I had to unintsall an application, due to conflicts or compatability issues, I could re-install it without having to download them again.
So, I have a total of 134 applications, which I have put into the following rough categories:
- AV - editing and manipulating audio, video or graphics
- Files - backup, copy or manipulate disks, files and file systems
- Browser - Internet Explorer, Firefox, Opera and Google Chrome
- Communications - Lotus Notes and Lotus Sametime
- Connect - programs to connect to different Web and Wi-Fi services
- Demo - programs I demonstrate to clients at briefings
- Drivers - attach or sync to external devices, cell phones, PDAs
- Games - not much here, the basic solitaire, mindsweeper and pinball
- Help Desk - programs to diagnose, test and gather system information
- Projects - special projects like Second Life or Lego Mindstorms
- Lookup - programs to lookup information, like American Airlines TravelDesk
- Meeting - I have FIVE different webinar conferencing tools
- Office - presentations, spreadsheets and documents
- Platform - Java, Adobe Air and other application runtime environments
- Player - do I really need SIXTEEN different audio/video players?
- Printer - print drivers and printer management software
- Scanners - programs that scan for viruses, malware and adware
- Tools - calculators, configurators, sizing tools, and estimators
- Uploaders - programs to upload photos or files to various Web services
- Backup my new workstation
My new ThinkPad T410 has a dual-core i5 64-bit Intel processor, so I burned a 64-bit version of [Clonezilla LiveCD] and booted the new system with that. The new system has the following configuration:
|/dev/sda1||320GB||NTFS||C:||Windows XP SP3 operating system, programs and data|
There were only 14.4GB of data, it took 10 minutes to backup to an external USB disk. I ran it twice: first, using the option to dump the entire disk, and the second to dump the selected partition. The results were roughly the same.
- Run Workstation Setup Wizard
The Workstation Setup Wizard asks for all the pertinent location information, time zone, userid/password, needed to complete the installation.
- Re-Partition Disk Drive
I burned a 64-bit version of [System Rescue CD] and ran [Gparted] to re-partition this disk into the following:
|/dev/sda1||40GB||NTFS||C:||Windows XP SP3 operating system and programs|
|/dev/sda2||15GB||ext3||/(root)||Ubuntu Desktop 10.04 LTS|
|/dev/sda6||245GB||NTFS||D:||My Documents and other data|
- Redefine Windows directory structure
I made two small changes to connect C: to D: drive.
- Changed "My Documents" to point to D:\Documents which will move the files over from C: to D: to accomodate its new target location. See [Microsoft procedure] for details.
- Edited C:\notes\notes.ini to point to D:\notes\data to store all the local replicas of my email and databases.
- Install Ubuntu Desktop 10.04 LTS
My plan is to run Windows and Linux guests through virtualization. I decided to try out Ubuntu Desktop 10.04 LTS, affectionately known as Lucid Lynx, which can support a variety of different virtualization tools, including KVM, VirtualBox-OSE and Xen. I have two identical 15GB partitions (sda2 and sda3) that I can use to hold two different systems, or one can be a subdirectory of the other. For now, I'll leave sda3 empty.
- Take another backup of my new workstation
I took a fresh new backup of paritions (sda1, sda2, sda6) with Clonezilla.
The next step involved a cross-over Ethernet cable, which I don't have. So that will have to wait until Thursday morning.
technorati tags: IBM, Lenovo, ThinkPad, T60, T410, Intel, Clonezilla, SysRescCD, Gparted, Windows, Ubuntu, Linux, Lucid, LTS
In my post [What is the Smartest Machine on Earth?], I described the storage inside [IBM Watson], the computer that will compete against two humans on the quiz show Jeopardy!
"When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte (TB). For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers."
I had several readers ask me to explain the significance of the "Terabyte". I'll work my way up.
A bit is simply a zero (0) or one (1). This could answer a Yes/No or True/False question.
Most computers have standardized a byte as a collection of 8 bits. There are 256 unique combinations of ones and zeros possible, so a byte could be used to storage a 2-digit integer, or a single upper or lower case character in the English alphabet. In pratical terms, a byte could store your age in years, or your middle initial.
- Kilobyte (KB)
The Kilobyte is a thousand bytes, enough to hold a few paragraphs of text. A typical written page could be held in 4 KB, for example.
The IBM Challenge to play on Jeopardy! is being compared to the historic 1969 moon landing. To land on the moon, Apollo 11 had the "Apollo Guidance Computer" (AGC) which had 74KB of fixed read-only memory, and 2KB of re-writeable memory. Over [3500 IBM employees were involved] to get the astronauts to the moon and safely back to earth again.
The importance of this computer was highlighted in a [lecture by astronaut David Scott] who said: "If you have a basketball and a baseball 14 feet apart, where the baseball represents the moon and the basketball represents the Earth, and you take a piece of paper sideways, the thinness of the paper would be the corridor you have to hit when you come back."
- Megabyte (MB)
The Megabyte is a thousand KB, or a million bytes. The 3.5-inch floppy diskette, mentioned in my post [A Boxfull of Floppies] could hold 1.44MB, or about 360 pages of text.
The first commercial disk system, the [350 Disk Storage Unit, introduced in 1956 for the IBM RAMAC computer], could hold only 5 MB and was the size of two refrigerators. While 5MB might not seem like much today, it is enough to hold the [Complete works of Shakespeare]. That's right, all 42 plays, poems and sonnets.
In the article [Wikipedia as a printed book], the printing of a select 400 articles resulted in a book 29 inches thick. Those 5,000 pages would consume about 20 MB of space.
One of my favorite resources I use to search is the Internet Movie Data Base [IMDB]. Leaving out the photos and videos, the [text-only portion of the IMDB database is just over 600 MB], representing nearly all of the actors, awards, nominations, television shows and movies. A standard CD-ROM can hold 700MB, so the text portion of the IMDB could easily fit on a single CD.
- Gigabyte (GB)
The Gigabyte is a thousand MB, or a billion bytes. My Thinkpad T410 laptop has 4GB of RAM and 320GB of hard disk space. My laptop comes with a DVD burner, and each DVD can hold up to 4.7GB of information.
The popular Wikipedia now has some 17 million articles, of which 3.5 million are in English language. It would only take [14GB of space to hold the entire English portion] of Wikipedia. That is small enough to fit on twenty CDs, three DVDs, an Apple iPad or my cellphone (a Samsung Galaxy S Vibrant).
Perhaps you are thinking, "Someone should offer Wikipedia pre-installed on a small handheld!" Too late. The [The Humane Reader] is able to offer 5,000 books and Wikipedia in a small device that connects to your television. This would be great for people who do not have access to the internet, or for parents who want their kids to do their homework, but not be online while they are doing it.
In the latest 2009 report of [How Much Information?] from the University of California, San Diego, the average American consumes 34 GB of information. This includes all the information from radio, television, newspapers, magazines, books and the internet that a person might look at or listen to throughout the day. This project is sponsored by IBM and others to help people understand the nature of our information-consuption habits.
Back in 1992, I visited a client in Germany. Their 90 GB of disk storage attached to their mainframe was the size of three refrigerators, and took five full-time storage administrators to manage.
- Terabyte (TB)
The Terabyte is a thousand GB, or a trillion bytes. It is now possible to buy external USB drive for your laptop or personal computer that holds 1TB or more. However, at 40MB/sec speeds that USB 2.0 is capable of, it would take seven hours to do a bulk transfer in or out of the device.
IBM offers 1TB and 2TB disk drives in many of our disk systems. In 2008, IBM was preparing to announce the first 1TB tape drive. However, Sun Microsystems announced their own 1TB drive the day before our big announcement, so IBM had to rephrase the TS1130 announcement to [The World's Fastest 1TB tape drive!]
In his book [The Singularity is Near: When Humans Transcend Biology], Ray Kurzweil estimates the human brain's memory can hold about 1.25 TB of information. This would make IBM Watson about 80 percent human.
A typical academic research library will hold about 2TB of information. For the [US Library of Congress] print collection is considered to be about 10TB, and their web capture team has collected 160TB of digital data. If you are ever in the Washington DC, I strongly recommend a visit to the Library of Congress. It is truly stunning!
Full-length computer animated movies, like [Happy Feet], consume about 100TB of disk storage during production. IBM offers disk systems that can hold this much data. For example, the IBM XIV can hold up to 151 TB of usable disk space in the size of one refrigerator.
A Key Performance Indicator (KPI) for some larger companies is the number of TB that can be managed by a full-time employee, referred to as TB/FTE. Discussions about TB/FTE are available from IT analysts including [Forrester Research] and [The Info Pro].
The website [Ancestry.com] claims to have over 540 million names in its genealogical database, with a storage of 600TB, with the inclusion of [US census data from 1790 to 1930]. The US government took nine years to process the 1880 census, so for the 1890 census, it rented equipment from Herman Hollerith's Tabulating Machine Company. This company would later merge with two others in 1911 to form what is now called IBM.
- Petabyte (PB)
A Petabyte is thousand TB, or a quadrillion bytes. It is estimated that all printed materials on Earth would represent approximately 200 PB of information.
IBM's largest disk system, the Scale-Out Network Attach Storage (SONAS) comprised of up to 7,200 disk drives, which can hold over 11 PB of information. A smaller 10-frame model, the same size as IBM Watson, with six interface nodes and 19 storage pods, could hold over 7 PB of information.
IBM's automated [TS3500 tape library with high-density frames] can hold up to 60 PB of compressed tape data. A smaller 10-frame model, the size of IBM Watson, could hold up to 36 PB of data.
For those of us in the IT industry, 1TB is small potatoes. I for one, was expecting it to be much bigger. But for everyone else, the equivalent of 200 million pages of text that IBM Watson has loaded inside is an incredibly large repository of information. I suspect IBM Watson probably contains the complete works of Shakespeare as well as other fiction writers, the IMDB database, all 3.5 million articles of Wikipedia, religious texts like the Bible and the Quran, famous documents like the Magna Carta and the US Constitution, and reference books like a Dictionary, a Thesaurus, and "Gray's Anatomy". And, of course, lots and lots of lists.
For those on Twitter, follow [@ibmwatson] these next three days during the challenge.
technorati tags: IBM, Watson, Challenge, Jeopardy, @ibmwatson, #ibmwatson