Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Tony Pearson is a Master Inventor and Senior Software Engineer for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
I attended the main tent sessions on Day 2 (Monday). The focuswas on Visibility, Control and Automation.
Steve is IBM senior VP and Group Executive of the IBM Software Group, and presented someinsightful statistics from the IBM Global Technology Outlookstudy, some recent IBM wins, and other nuggets of IT trivia:
In 2001, there were about 60 million transistors per humanbeing. By 2010, this is estimated to increase to one billion per human
In 2005, there were about 1.3 billion RFID tags, by 2010this is estimated to grow to over 30 billion
IBM helped the City of Stockholm, Sweden, reduce traffic congestion 20-25% using computer technology
Only about 25% data is original, the remaining75% is replicated
In 2007, there were approximately 281 Exabytes (EB), expected to increase to 1800 EB by the year 2011
70 percent of unstructured data is user-created content, but 85 percent of this will be managed by enterprises
Only 20% of data is subject to compliance rules and standards, and about 30% subject to security applications
Human error is the primary reason for breaches, with34% of organizations experiencing a major breach in 2006
10% of IT budget is energy costs (power and cooling), and thiscould rise to 50% in the next decade
30 to 60 percent of energy is wasted. During the next 5 years, people will spend as much on energy as they will on new hardware purchases.
Al Zollar is the General Manager of IBM Tivoli. He discussedthe 20 some recent software acquisitions, including Encentuate and FilesX earlier this year.
"The time has come to fully industrialize operations" -- Al Zollar
What did Al mean about "industrizalize"? This is theclosed-loop approach of continuous improvement, including design, delivery and management.
Al used several examples from other industries:
Henry Ford used standardized parts and processautomation. Assembly of an automatobile went from 12 hours by master craftsmen, to delivering a new model T every 23 seconds off anassembly line.
Power generation was developed by Thomas Edison. A satellite picture showed the extent of the [Blackout of 2003 in Northeast US and Canada]. The time for "smart grid" has arrived, making sensors andmeters more intelligent. This allows non-essential IP-enabled appliances in our home or office to be turned off to reduce energy consumption.
[McCarran International Airport] integrated the management of 13,000 assets with IBM Tivoli Maximo Enterprise Asset Management (EAM) software, and was able to increase revenues through more accurate charge-back. Unlike traditional EnterpriseResource Planning (ERP) applications, EAM offers the deep management of four areas: production equipment, facilities, transportation, and IT.
When compared to these other industries, management of IT is in itsinfancy. The expansion of [Web 2.0] and Service-Oriented Architecture [SOA] is driving this need.What people need is a "new enterprise data center" that IBM Tivoli software can help you manage across operational boundaries. IBM can integrate through open standards with management software from Cisco, Sun, OracleMicrosoft, CA, HP, BMC Software, Alcatel Lucent, and SAP.Together with our ecosystems of technology partners, IBM ismeeting these challenges.
IBM clients have achieved return on investment from gettingbetter control of their environment. This week there are client experience presentations Sandia National Labs, Spirit AeroSystems, Bank of America, and BT Converged communication services.
Chris O'Connor used some of his staff as "actors" to show an incredible live demo of various Tivoli and Maximo products for the mythical launch of "Project Vitalize", thenew online web store for a new "Aero Z bike" from the mythical VCA Bike and Motorcycle company.
Shoel Perelman played the role of "CIO".The CIO locked down all spending, and asked the IT staff to make the shift from bricks-and-mortar to web salesof this new product on in 15 months. While the company andsituation were mythical, all the products that were part of thelive demo are all readily available.The CIO had three goals:
What do we have? where is it? what's connected to what?Traditionally, these would be answered from lists in spreadsheets.The CIO had a goal to deploy IBM Tivoli Application DependenceDiscover Manager (TADDM) which discovered all hardware and software,with an easy to understand view, and how each piece serves the business applications.
Each of the teams have processes, and needed them consistent andrepeatable, tightly linked together. Time is often wasted on thephone coordinating IT changes. For this, the CIO had a goalto deploy Tivoli Change and Configuration Management Database (CCMDB) for "strict change control".The process dashboard is accessible for all teams, to see how all projects are progressing. There is also aCompliance dashboard, which identifies all changes by role, clearly spelling out who can do what.
There is a lot of computerized machinery, Manufacturing assets and robotics. The CIO set a goal to "do more with existing people", and needed to automate key processes.Sales rep wanted to add a new distributor to key web portal.This was all done through their "service catalog", When they needed to deploy a new application, they were able to find servers with available capacity and adjust using automatic provisioning. Thanks to IBM, the IT staff no longer get paged at 3am in the morning, and fewer days are spent in the "war room". They now have confidence that thelaunch will be successful.
Ritika Gunnar played the role of "Operations manager". She highlightedfive areas:
"Service viewer" dashboard with green/yellow/red indicators forall of their edge, application and datbase servers. This allowsher to get data 4-5 times faster and more accurate.
Tivoli Enterprise Portal eliminates bouncingaround various products.
Tivoli Common Reporting for CPU utilization of all systems, helps find excess capacity usingIBM Tivoli Monitor
On average, 85 percent of problems are caused by IT changes to the environment. IBM can help find dependencies, so that changes in one area do not impact other areas unexpectedly
Process Automation will Show changes that have been completed, in progress, or overdue.She can see all steps in a task or change request. A"workflow" automates all the key steps that need to be taken.
Laura Knapp played the role of "Facilities manager". She wanted to See all processes that apply to her work using a role-based process dashboard. The advantage of using IBM is that it changes work habits, reduces overtimeby 42 percent, improves morale. The IT staff now works as team,collaborates more, and jobs get done faster with fewer mistakes.Employees are online, accessing, monitoring and managing dataquicker. In days not weeks.
IBM Tivoli Enterprise Console (TEP) served as a common vehicle.She was able to pull up floor plan online, displaying all of the managed assets and mapped features. With the temperature overlay from Maximo Spatial, she was able to review hot spots on data center floor. Heat can cause servers to fail or shut down.
Power utilization chart at peak loadsCan now anticipate, predict and watch power consumption,and were able to justify replacement with newer, more energy-efficient equipment.
The CIO got back on stage, and explained the great success of thelaunch. They use Webstore usage tracking, security tools tracking all new registrations, and trackingserver and storage load.It now only takes hours, not weeks, to add new business partners and distributors.Tivoli Service Quality Assurance toolstrack all orders placed, processed, and shipped.Faster responsiveness is competitive advantage. TheirIT department is no longer seen as stodgy group, but as a world classorganization.
The live demo showed how IBM can help clients with rapid decisionmaking, speed and accuracy of change processes, and automation to take actions quickly. The result is a strong return on investment (ROI).
Liz Smith, IBM General Manager of Infrastructure Services, presented the results of an IBM survey to CEOs and CIOs asking questions like: What is the next big impact? Where are you investing?What will new datacenter look like?
The five key traits they found for companies of the future:
They were hungry for change
Innovative beyond customer imagination
Disruptive by nature
Genuine, not just generous
The IT infrastructure must be secure, reliable, and flexible.Taking care of environment is a corporate responsibility, notjust a way to reduce costs.
The five entry points for IBM Service Management: Integrate, Industrialize,Discover,Monitor and Protect.IBM Service management and compliance are critical for theGlobally Integrated Enterprise, with repeatable, scalable and consistent processes that enablechange to an automated workflow. This reduces errors, risks and costs, and improves productivity.IBM has talent, assets and experience to help any client get there.
Lance lives in Austin, TX, where IBM Tivoli is headquartered,so this made a good choice as a keynote speaker.He is best known for winning seven "Tour de France" bicycle races in a row, but he spoke instead gave an inspirational talk about how he survived cancer.
In 1996, Lance was diagnosed with cancer. Surprisingly, He said it was thegreatest thing that happened to him, and gave him new perspective on his life, family and the sport ofbicycling.Back then, there wasn't a webMD, Google or other Web 2.0 socialnetworking sites for Lance to better understand what he wasgoing through, learn more about treatment options, or find othersgoing through the same ordeal.
After his treatment, he was considered "damaged goods" by manyof the leading European bicycle teams. So, he joined the US Postal Serviceteam, not known for their wins, but often invited to sell TVrights to American audiences. Collaborating with his coachesand other members of his team, he revolutionized the bicycling sport, analyzed everything about the race, and built up morale.He won the first "yellow jersey" in 1999, and did so each yearfor a total of seven wins.
Lance formed the [Livestrong foundation] to help other cancer survivors. Nike came to him and proposed donating 5 million "rubber bracelets"colored yellow to match his seven yellow jerseys, with the name "Livestrong" embossed on them, that his foundation couldthen sell for one dollar apiece to raise funds. What some thought was a silly idea at first has started amovement.At the 2004 Olympics, many athletes from all nations and religious backgrounds, wore these yellow braceletsto show solidarity with this cause.To date, the foundation has sold over 72 million yellow bracelets, and these have served to provide a symbol,a brand, a color identity, to his cause.
He explained that doctor's have a standard speech to cancer survivors.As a patient, you can go out this doorway and never tell anyone,keep the situation private. Or you can go out this other doorway, you tell everybody your story. Lance chose the latter, and he felt it was the best decision he ever made.He wrote a book titled [It's Not About the Bike: My Journey Back to Life].
His call to action for the audience: find out what can you do to make a difference.A million non-governmental organizations[NGO] have started in the past 10 years. Don't just give cash, also give your time and passion.
It seems like I just get out of one conference, and into another. This week I am at Pulse 2008, which combines the best of IBM Tivoli and Maximo into one conference.Like many conferences, this one starts on Sunday, and ends on Thursday.
We're at the Swan and Dolphin hotels at [Walt Disney World] in Orlando, Florida. I've been to several conferences in Orlando, but this is my first time at the Swanand Dolphin. (When I walked into the main lobby, I had a bout of "deja vu". IBM LotusSphere was here last year, and they had a complete replica made in SecondLife!)
If you haven't been to Walt Disney World resorts, whether for a conference or vacation,there are two things you need to know:
Nothing is within a short "walking distance", you need to take a bus or boat to get anywhere
Despite this, you will be doing a lot of walking, so wear comfortable shoes!
Pulse encouraged everyone to blog and take pictures posted onto FlickR, here are a few from Sunday:
Lou and Elizabeth from [Syclo], an IBM Business Partner
Mike and Megha from [Birlasoft] show off their accreditation
Greg Tevis explains FilesX, recently acquired by IBM
I'm glad this is the final day of the IBM Systems Technical Conference (STC08) here in Los Angeles.While I enjoyed the conference, one quickly reaches saturation point with all the information presented.
XIV Architecture Overview
Before this conference, many of the attendees didn't understandIBM's strategy, didn't understand Web 2.0 and Digital archive workloads,and didn't understand why IBM acquired XIV to offer "yet another disk systemthat servers LUNs to distributed server platforms." Brian Shermanchanged all that!
Brian Sherman, IBM Advanced Technical Support (ATS), is part of the exclusive dedicated XIVtechnical team to install these boxes at client locations, so he is very knowledgeable with the technical aspects of the architecture. He presented what the current XIV-branded model that clients can purchase now in select countries, and what the IBM-branded model will change when available worldwide.
Those who missed my earlier series on XIV can find them here:
Beyond this, Brian gave additional information on how thin provisioning, storage pools, disk mirroring, consistency groups, management consoles, and microcode updates are implemented.
N series and VMware Deep Dive
Norm Bogard, IBM Advanced Technical Support, presented why the IBM N series makes such great disk storage for VMware
deployments. This wasclearly labeled as a "deep dive", so anyone who got lost in all of theacronyms could not blame Norm for misrepresentation.
IBM has been doing server virtualization for over 40 years, so it makes sense thatit happens to be the number one reseller of VMware offerings.VMware ESX server is a hypervisor that runs on x86 host, and provides an emulationlayer for "guest Operating Systems". Each guest can hvae one or more virtualdisks, which are represented by VMware as VMDK files. VMware ESX server acceptsread/write requests from the guests, and forwards them on to physical storage.Many of VMware's most exciting features requires storage to be external to thehost machine. [VMotion]allows guests to move from one host to another, [Distributed Resource Scheduler (DRS)]allows a set of hosts to load-balance the guestsacross the hosts, and [High Availability (HA)] allows the guests on a failed hostto be resurrected on a surviving host. All of these require external disk storage.
ESX server allows up to 256 LUNs, attached via FCP and/or iSCSI, and up to 32 NFS mount points. Across LUNs, ESX server uses VMFS file system, which is a clusteredfile system like IBM GPFS that allows multiple hosts to access the same LUNs.ESX server has its own built-in native multipathing driver, and even provides FCP-iSCSIand iSCSI multipathing. In other words, you can have a LUN on an IBM System Storage N series thatis attached over both FCP and iSCSI, so if the SAN switch or HBA fails, ESX servercan failover to the iSCSI connection.
ESX server can use NFS protocol to access the VMDK files instead. While the default is only 8 NFS mount points, you can increase this to 32 mount points. NAS can takeadvantage of Link Aggregate Control Protocol [LACP] groups, what some call "trunking" or "EtherChannel". This is the ability to consolidate multiple streams onto fewer inter-switch Ethernet links, similar to what happens on SAN switches.For the IBM N series, IBM recommends a "fixed" path policy, rather than "most recently used".
IBM recommends disabling SnapShot schedules, and setting the Snap reserve to 0 percent.Why? A snapshot of an ESX server datastore has the VMDK files of many guests, all of which would have had to quiesce or stop to make the data "crash consistent" for theSnapshot of the datastore to even make any sense. So, if you want to take Snapshots, itshould be something you coordinate with the ESX server and its guest OS images, and notscheduled by the N series itself.
If you are running NFS protocol to N series, you can turn off the "accesstime" updates. In normal file systems, when you read a file, it updates the"access time" in the file directory. This can be useful if you are looking forfiles that haven't been read in a while, such as software that migrates infrequentlyaccessed files to tape. Assuming you are not doing that on your N series, you might as well turnoff this feature, and reduce the unnecessary write activity to the IBM N series box.
ESX server can also support "thin provisioning" on the IBM N series. There isa checkbox for "space reserved". Checked means "thick provisioning" and uncheckedmeans "thin provisioning". If you decide to use "thin provisioning" with VMware,you should consider setting AutoSize to automatically increase your datastorewhen needed, and to auto-delete-snap your oldest snapshots first.
The key advantage of using NFS rather than FCP or iSCSI is that it eliminates theuse of the VMFS file system. IBM N series has the WAFL file system instead, andso you don't have to worry about VMFS partition alignment issue. Most VMDK aremisaligned, so the performance is sub-optimal. If you can align each VMDK to a32KB or 64KB boundary (depending on guest OS), then you can get better performance.WAFL does this for you automatically, but VMFS does not. For Windows guests, use "Windows PE" to configurecorrectly-aligned disks. For UNIX or Linux guests, use "fdisk" utility.
What Industry Analysts are saying about IBM
Vic Peltz gave a presentation highlighting the accolades from securities analysts, IT analysts, and newsagencies about IBM and IBM storage products. For example, analysts like that IBM offersmany of the exciting new technologies their clients are demanding, like "thin provisioning", RAID-6 double-drive protection,SATA and Solid State Disk (SSD) drive technology.Analysts also like that IBM is open to non-IBM heterogeneous environments. Whereas EMC Celerra gateways supportonly EMC disk, IBM N series gateways and IBM SAN Volume Controller support a mix of IBM and non-IBM equipment.
Analysts also like IBM's "datacenter-wide" approach to issues like security and "Green IT". Rather than focusingon these issues with individual point solutions, IBM attacks these challenges with a complete"end-to-end" solution approach. A typical 25,000 square foot data center consumes $2.6 million dollars USD in power andcooling today, and IBM has proven technologies to reduce this cost in half. IBM's DS8000 on average consume26.5 to 27.8 percent less electricity than a comparable EMC DMX-4 disk system. IBM's tape systemsconsume less energy than comparable Sun or HP models.
IBM iDataPlex product technical presentation
Vallard Benincosa, IBM Technical Sales Specialist, presented the recently-announced [IBM System x iDataPlex].This is designed for our clients that have thousands of x86 servers, that buy servers "racks at a time", tosupport Web 2.0 and digital archive workloads. The iDataPlex is designed for efficient power and cooling,rapid scalability, and usable server density.
iDataPlex is such a radical design departure, that it might be difficult to describe in words.Most racks take up two floor tiles, each tile is 2 foot by 2 foot square. In that space, a traditionalrack would have servers that were 19 inches wide slide in horizontally, with flashing lights and hot-swappabledisks in the front, and all the power supply, fans and networking connections in the back. Even with IBM BladeCenter,you have chassis in these racks, and then servers slide in vertically in the front, and all of the power supply, fanand networking connections in the back. To access these racks, you have to be able to open the door on boththe front and back. And the cooling has to go through at least 26.5 inches from the front of the equipment to the back.
iDataPlex turns the rack sideways. Instead of two feet wide, and four feet deep, it is four feet wide, and two feet deep.This gives you two 19 inch columns to slide equipment into, and the air only has to travel 15 inches from frontto back. Less distance makes cooling more efficient.
Next, iDataPlex makes only thing in the back the power cord, controlled by an intelligent power distribution unit (iPDU) so you can turnthe power off without having to physically pull the plug. Everything else is serviced from the front door.This means that the back door can now be an optional "Rear Door Heat Exchanger" [RDHX] that is filled with running water to makecooling the rack extremely efficient. Water from a cooler distirubtion unit (CDU) can power about threeto four RDHX doors.
Let's say you wanted to compare traditional racks with iDataPlex for 84 servers. You can put 42 "1U" serversin two racks each, each rack requires 10 kVA (kilo-volt-amps) so you give it two 8.6 kVA feeds each, that is fourfeeds, and at $1500-2000 dollars USD per month, will cost you $6000-8000. The iDataPlex you can fit 84 serversin one 20 kVA rack, with only three 8.6 kVA feeds, saving you $1500-2000 dollars USD per month.
Fans are also improved. Fan efficiency is based on their diameter, so small fans in 1U servers aren't as effective as iDataPlex's 2U fans, saving about 12-49W per server. Whereas typical 1U server racks spend 10-20percent of their energy on the fans, the iDataPlex spends only about 1 percent, saving 8 to 36 kWH per year per rack.
Each 2U chassis snaps into a single power supply and a bank of 2U fans. A "Y"power cord allows you to have one cord for two power supplies. A chassis can hold either two small server "flexnodes"or one big "flexnode". An iDataPlex rack can hold up to 84 small servers or 42 big servers. Since each "Y" cord can power up to four "flexnode" servers, you greatly reduce the number of PDU sockets taken,leaving some sockets available for traditional 1U switches.
The small "flexnode" server can have one 3.5 inch HDD, or two 2.5 inch HDD, either SAS or SATA, and the big "flexnode" can have twice these.If you need more storage, there is a 2U chassis that holds five 3.5 inch HDD or eight 2.5 inch HDD. These areall "simple-swappable" (servers must be powered down to pull out the drives). For hot-swappable drives, a 3Uchassis with twelve 3.5 inch SAS or SATA drives.
The small "flexnode" server has one [PCI Express] slot, the big servers have two. Thesecould be used for [Myrinet] clustering. With only 25W power,the PCI Express slots cannot support graphics cards.
The iDataPlex is managed using the "Extreme Cluster Administration Toolkit" [XCAT]. This is an open source project under Eclipse that IBM contributes to.
Finally was the concept of "pitch". This is the distance from the center of one "cold aisle" to the next "cold aisle".On typical data centers, a pitch is 9 to 11 tiles. With the iDataPlex it is only three tiles when using the RDHX doors, or six tiles without. Most data centers run out of power and cooling before they run out of floor space, so having more dense equipmentdoesn't help if it doesn't also use less electricity.Since the iDataPlex uses 40 percent less power and cooling, you can pack more racks persquare foot of an existing data center floor with the existing power and cooling available. That is what IBM calls "usable density"!
What Did You Say? Effective Questioning and Listening Techniques
Maria L. Anderson, IBM Human Resources Learning, gave this "professional development" talk. I deal with different clients every week, so I fully understand that there is a mix of art and science incrafting the right questions and listening to the responses.The focus was on howto ask better questions and improve the understanding and communication during consultative engagements. Thisinvolves the appropriate mix of closed and open-ended questions, exchanging or prefacing as needed. This wasa good overview of the ERIC technique (Explore, Refine, Influence, and Confirm).
Well, that wraps up my week here in Los Angeles.Special thanks to my two colleagues, Jack Arnold and Glenn Hechler, both from the Tucson Executive Briefing Center,who helped me prepare and review my presentations!
Continuing this week in Los Angeles, I went to some interesting sessions today at theSystems Technical Conference (STC08).
System Storage Productivity Center (SSPC) - Install and Configuration
Dominic Pruitt, an IBM IT specialist in our Advanced Technical Support team, presented SSPC and howto install and configure it. For those confused between the difference of TotalStorage ProductivityCenter and System Storage Productivity Center, the former is pure software that you install on aWindows or Linux server, and the latter is an IBM server, pre-installed with Windows 2003, TotalStorageProductivity Center software, TPCTOOL command line interface, DB2 Universal Database, the DS8000 Element Manager, SVC GUI and CIMOM, and [PuTTY] rLogin/SSH/Telnet terminal application software.
Of course, the problem with having a server pre-installed with a lot of software is that there is alwayssomeone that wants to customize it further. For those who just want to manage their DS8000 disk systems,for example, it is possible to uninstall the SVC GUI, CIMOM and PuTTY, and re-install them later when youchange your mind. As a general rule, it is not wise to mix CIMOMs on the same machine, as it might causeconflicts with TCP ports or Java level requirements, so if you want a different CIMOM than SVC, uninstallthe SVC CIMOM first. For those who have SVC, the SSPC replaces the SVC Master Console, so you can safelyturn off the SVC CIMOM on your existing SVC Master Consoles.
The base level is TotalStorage Productivity Center "Basic Edition", but you can upgrade the Productivity Centerfor Disk, Data and Fabric components with license keys. You can also run Productivity Center for Replication,but IBM recommends adding processor and memory to do this (IBM offers this as an orderable option).Whether you have the TotalStorage software or SSPC hardware, Productivity Center has a cool role-to-groups mapping feature.You can create user groups, either on the Windows server, the Active Directory, or other LDAP, and then map which roles should be assigned to users in each group.
Since Productivity Center manages a variety of different disk systems, it has made anattempt to standardize some terminology. The term "storage pool" refers to an extentpool on the DS8000, or a managed disk group on the SAN Volume Controller. Since the DS8000 can support both mainframe CKD volumes and LUNs for distributed systems, theterm "volume" refers to a CKD volume or LUN, and "disk" refers to the hard disk drive (HDD).
To help people learn Productivity Center, IBM offers single-day "remote workshops"that use Windows Remote Desktop to allow participants to install, customize and usethe software with no travel required.
IBM Integrated Approach to Archiving
Dan Marshall, IBM global program manager for storage and data services on our Global Technology Services team, presented IBM's corporate-wide integration to support archive across systems, software and services.One attendee asked me why I was there, given that "archive" is one of my areas of subject matter expertise that I present often at the Tucson Executive Briefing Center. I find it useful to watch others present the material, even material that I helped to develop, to see a different slant or spin on each talking point.
Archive is one area that brings all parts of IBM together: systems, software and services.Dan provided a look at archive from the services angle, providing an objective unbiasedview of the different software and systems available to solve specific challenges.
Encryption Key Manager (EKM) Design and Implementation
Jeff Ziehm, IBM tape technical sales specialist, presented IBM's EKM software, how it works in a tape environment, and how to deploy it in various environments. Since IBM is allabout being open and non-proprietary, the EKM software runs on Java on a variety ofIBM and non-IBM operating systems. IBM offers "keytool" command line interface (CLI) for the LTO4 and TS1120 tape systems, and "iKeyMan" graphical user interface (GUI) for theTS1120. Since it runs on Java, IBM Business Partners and technical support personneloften just [download and install EKM]onto their own laptops to learn how to use it.
Virtual Tape Update
We had three presenters at this one. First, Jeff Mulliken, formerly from Diligent and now a full IBM employee, presented the current ProtecTier softwarewith the HyperFactor technology, then Abbe Woodcock, IBM tape systems, compared Diligent with IBM's TS7520 and just-announced TS7530virtual tape libraries, and finally Randy Fleenor, IBM tape sales leader, presented IBM's strategy going forward in tape virtualization.
Let's start with Diligent. The ProtecTier software runs on any x86-64 server withat least four cores and the correct Emulex host bus adapter (HBA) cards. Using Red HatEnterprise Linux (RHEL) as a base, the ProtecTier software performs its deduplication entirely in-lineat an "ingest rate" of 400-450 MB/sec. This is all possible using 4GB memory-resident "dictionary table" that can map up to 1 PB of back end physical storage, which could represent as much as 25PB of "nominal" storage. Theserver is then point-to-point or SAN-attached to Fibre Channel disk systems.
As we learned yesterday from Toby Marek's session, there are four ways to performdeduplication:
full-file comparisons. Store only one copy of identical files.
fixed-chunk comparisons. Files are carved up into fixed-size chunks, and each chunkis compared or hashed to existing chunks to eliminate duplicates.
variable-chunk comparisons. Variable-length chunks are hashed or diffed to eliminate duplicate data.
content-aware comparisons. If you knew data was in Powerpoint format, for example,you could compare text, photos or charts against other existing Powerpoint files toeliminate duplicates.
IBM System Storage N series Advanced Single Instance Storage (A-SIS) uses fixed-chunkmethod, and Diligent uses variable-chunk comparisons. Diligent does this using "dataprofiling". For example, let's say most of my photographs are pictures of people, buildings, landscapes, flowers and IT equipment. When I back these up, the Diligentserver "profiles" each, and determines if any existing data have a similar profilethat might have at least 50 percent similar content. Diligent than reads in the data that is mostly likely similar, does a byte-for-byte ["diff" comparison], and creates variable-lengthchunks that are either identical or unique to sections of the existing data. Theunique data is compressed with LZH and written to disk, and the sequential series of pointer segments representing the ingested file is written in a separate section on disk.
That Diligent can represent profiles for 1PB of data in as little as 4GB memory-residentdictionary is incredible. By comparison, 10TB data would require 10 million entries on a content-aware solution, and 1.25 billion entries for one based on hash-codes.
Abbe Woodcock presented the TS7530 tape system that IBM announced on Tuesday. It has some advantages over the current Diligent offering:
Hardware-based compression (TS7520 and Diligent use software-based compression)
1200 MB/sec (faster ingest rate than Diligent)
1.7PB of SATA disk (more disk capacity than Diligent)
Support for i5/OS (Diligent's emulation of ATL P3000 with DLT7000 tapes not supported on IBM's POWER systems running i5/OS)
Ability to attach a real tape library
NDMP backup to tape
tape "shredding" (virtual equivalent of degaussing a physical tape to erase all previously stored data)
Randy Fleenor wrapped up the session telling us IBM's strategy going forward with all of thevirtual tape systems technologies. Until then, IBM is working on "recipes" or "bundles", puttingDiligent software with specific models of IBM System x servers and IBM System Storage DS4000 disk systemsto avoid the "do-it-yourself" problems of its current software-only packaging.
Understanding Web 2.0 and Digital Archive Workloads
I got to present this in the last time slot of the day, just before everyone headed off to the [Westin Bonaventure hotel] for our big fancy barbecue dinner. Like my previous sessionon IBM Strategy, this session was more oriented toward a sales audience, but both garnereda huge turn-out and were well-received by the technical attendees.
This session was requested because these new applications and workloads are what is driving IBM to acquire small start-ups like XIV, deploy Scale-Out File Services (SOFS), and develop the innovative iDataPlex server rack.
The session was fun because it was a mix of explanation of the characteristics ofWeb 2.0 services; my own experience as a blogger and user of Google Docs, FlickR, Second Life andTivo; and an exploration in how database and digital archives will impact thegrowth in computing and storage requirements.
I'll expand on some of these topics in later blog posts.
My session was the first in the morning, at 8:30am, but managed to pack the room full of people. A few looklike they just rolled in from Brocade's special get-together in Casey's Irish Pub the night before.I presented how IBM's storage strategy for the information infrastructure fits into the greater corporate-wide themes.To liven things up, I gave out copies of my book[Inside System Storage: Volume I] to those who asked or answered the toughest questions.
Data Deduplication and IBM Tivoli Storage Manager (TSM)
IBM Toby Marek compared and contrasted the various data deduplication technologies and products available, andhow to deploy them as the repository for TSM workloads. She is a software engineer for our TSM software product,and gave a fair comparison between IBM System Storage N series Advanced Single Instance Storage (A-SIS), IBMDiligent, and other solutions out in the marketplace.If you are going to combine technologies, then it isbest to dedupe first, then compress, and finally encrypt the data. She also explained about the many cleverways that TSM does data reduction at the client side greatly reduces the bandwidth traffic over the LAN,as well as reducing disk and tape resources for storage. This includes progressive "incremental forever" backup for file selection, incremental backups for databases, and adaptive sub-file backup.Because of these data reduction techniques, you may not get as much benefit as deduplication vendors claim.
The Business Value of Energy Efficiency Data Centers
Scott Barielle did a great job presenting the issues related to the Green IT data center. He is part of IBM"STG Lab Services" team that does energy efficiency studies for customers. It is not unusual for his teamto find potential savings of up to 80 percent of the Watts consumed in a client's data center.
IBM has done a lot to make its products more energy efficient. For example, in the United States, most datacenters are supplied three-phase 480V AC current, but this is often stepped down to 208V or 110V with powerdistribution units (PDUs). IBM's equipment allows for direct connection to this 480V, eliminating the step-downloss. This is available for the IBM System z mainframe, the IBM System Storage DS8000disk system, and larger full-frame models of our POWER-based servers, and will probably be rolled out to someof our other offerings later this year. The end result saves 8 to 14 percent in energy costs.
Scott had some interesting statistics. Typical US data centers only spend about 9 percent of their IT budgeton power and cooling costs. The majority of clients that engage IBM for an energy efficiency study are not tryingto reduce their operational expenditures (OPEX), but have run out, or close to running out, of total kW ratingof their current facility, and have been turned down by their upper management to spend the average $20 million USDneeded to build a new one. The cost of electricity in the USA has risen very slowly over the past 35 years, andis more tied the to fluctuations of Natural Gas than it is to Oil prices.(a recent article in the Dallas News confirmed this:["As electricity rates go up, natural gas' high prices, deregulation blamed"])
Cognos v8 - Delivering Operational Business Intelligence (BI) on Mainframe
Mike Biere, author of the book [BusinessIntelligence for the Enterprise], presented Cognos v8 and how it is being deployed for the IBMSystem z mainframe. Typically, customers do their BI processing on distributed systems, but 70 percent of the world's business data is on mainframes, so it makes sense to do yourBI there as well. Cognos v8 runs on Linux for System z, connecting to z/OS via [Hypersockets].
There are a variety of other BI applications on the mainframe already, including DataQuant,AlphaBlox, IBI WebFocus and SAS Enterprise Business Intelligence. In addition to accessing traditional onlinetransaction processing (OLTP) repositories like DB2, IMS and VSAM, using the [IBM WebSphere ClassicFederation Server], Cognos v8 can also read Lotus databases.
Business Intelligence is traditionally query, reporting and online analytics process (OLAP) for the top 10 to 15 percent of the company, mostly executives andanalysts, for activities like business planning, budgeting and forecasting. Cognos PowerPlay stores numericaldata in an [OLAP cube] for faster processing.OLAP cubes are typically constructed with a batch cycle, using either "Extract, Transfer, Load" [ETL], or "Change Data Capture" [CDC], which playsto the strength of IBM System z mainframe batch processing capabilities.If you are not familiar with OLAP, Nigel Pendse has an article[What is OLAP?] for background information.
Over the past five years, BI is now being more andmore deployed for the rest of the company, knowledge workers tasked with doing day-to-day operations. Thisphenomenom is being called "Operational" Business Intelligence.
IBM Glen Corneau, who is on the Advanced Technical Support team for AIX and System p, presented the IBMGeneral Parellel File System (GPFS), which is available for AIX, Linux-x86 and Linux on POWER.Unfortunately, many of the questions were related to Scale Out File Services (SOFS), which my colleague GlennHechler was presenting in another room during this same time slot.
GPFS is now in its 11th release since its introducing in 1997. All of the IBM supercomputers on the [Top 500 list] use GPFS. The largest deployment of GPFS is 2241 nodes.A GPFS environment can support up to 256 file systems, each file system can have up to 2 billion filesacross 2 PB of storage. GPFS supports "Direct I/O" making it a great candidate for Oracle RAC deployments.Oracle 10g automatically detects if it is using GPFS, and sets the appropriate DIO bits in the stream totake advantage of GPFS features.
Glen also covered the many new features of GPFS, such as the ability to place data on different tiers ofstorage, with policies to move to lower tiers of storage, or delete after a certain time period, all conceptswe call Information Lifecycle Management. GPFS also supports access across multiple locations and offersa variety of choices for disaster recovery (DR) data replication.
Perhaps the only problem with conferences like this is that it can be an overwhelming["fire hose"] of information!
This week I'm in Los Angeles for the Systems Technology Conference (STC '08).We have over 1900 IT professionals attending, of which 1200 IBMers from North America, Latin America,and Asia Pacific regions, as well as another 350 IBM Business Partners. The rest, including me, are world wideor from other areas.
Last January, IBM reorganized its team to be more client-focused. Instead of focused on products, we are nowclient-centric, and have teams to cover our large enterprise systems through direct sales force, business systemsfor sales through our channel business partners, and industry systems for specific areas like deep computing,digital surveillance and retail systems solutions.
In addition to 788 sessions to attend these next four days, we had a few main tent sessions.My third line (my boss' boss' boss) David Gelardi presented Enterprise Systems. This is the group I am in.
Akemi Watanabe presented for Business Systems. Her native language is Japanese, so to do an entire talk inEnglish was quite impressive. Her focus is on SMB accounts, those customers with less than 1000 employeesthat are looking for easy-to-use solutions. She mentioned IBM's new [Blue Business Platform] which includesLotus Foundation Start, an Application Integration Toolkit, and the Global Application Marketplace.
Part of this process is the merger of System p and System i into "POWER" systems, and then offering both midrangeand enterprise versions of these that run AIX, i5/OS and Linux on POWER. It turns out that only 9 percent of ourSystem i customers are only on this platform. Another 87 percent have Windows, so it makes sense to offer i5/OSon BladeCenter, to consolidate Windows servers from HP, Dell or Sun over to IBM.
Meanwhile, IBM's strategy to support Linux has proven successful. 25 percent of x86 servers now run Linux. IBMhas 600 full-time developers for Linux, over 500 of which contributed to the latest 2.6 kernel development. Our ["chiphopper"] program has successfullyported over 900 applications. There are now over 6500 applications that run on Linux applications, on our strategic alliances with Red Hat (RHEL) and Novell (SUSE) distributions of Linux.
Her recommendation to SMB reps: learn POWER systems, BladeCenter, and Linux. I agree!
Mary Coucher presented Industry systems. In addition to the game chips for the Sony Playstation, Nintendo Wii,and Microsoft Xbox-360, this segment focuses on Digital Video Surveillance (DVS), Retail Solutions, Healthcare and Life sciences (HCLS), OEM and embedded solutions, and Deep computing. She mentioned our recently announcediDataPlex solution.
IBM is focused on "real-world-aware" applications, which includes traffic, crime, surveillance, fraud, andRFID enablement. These are streams of data that happen real-time, that need to be dealt with now, not later.
Most people know that IBM has the majority of the top 500 supercomputers, but few may not realize that IBMalso has delivered solutions to the top 100 green companies. IBM success is explained in more detail in this[Press Release].
The group split up to four different platform meetings: Storage, Modular, Power, and Mainframe. Barry Rudolphpresented for the Storage platform. He talked about the explosion in information, business opportunities,risk and cost management. IBM has shifted from being product-focused, to the stack of servers and storage,to our latest focus on solutions across the infrastructure. He mentioned our DARPA win for [PERCS] which stands for productive,easy-to-use, reliable computing system.
My theme this week was to focus on "Do-it-Yourself" solutions, such as the "open storage" concept presentedby Sun Microsystems, but it has morphed into a discussion on vendor lock-in. Both deserve a bit of furtherexploration.
There were several reasons offered on why someone might pursue a "Do-it-Yourself" course of action.
Building up skills
In my post [Simply Dinners and Open Storage], I suggested that building a server-as-storage solution based on Sun's OpenSolaris operating system could serve to learn more about [OpenSolaris], and by extension, the Solaris operating system.Like Linux, OpenSolaris is open source and has distributions that run on a variety of chipsets, from Sun's ownSPARC, to commodity x86 and x86-64 hardware. And as I mentioned in my post [Getting off the island], a version of OpenSolaris was even shown to run successfully on the IBM System z mainframe.
"Learning by Doing" is a strong part of the [Constructivism] movement in education. TheOne Laptop Per Child [OLPC] uses this approach. IBM volunteers in Tucson and 40other sites [help young students build robots]constructed from [Lego Mindstorms]building blocks.Edward De Bono uses the term [operacy] to refer to the"skills of doing", preferred over just "knowing" facts and figures.
However, I feel OpenSolaris is late to the game. Linux, Windows and MacOS are all well-established x86-based operating systems that most home office/small office users would be familiar with, and OpenSolaris is positioning itself as "the fourth choice".
In my post[WashingtonGets e-Discovery Wakeup Call], I suggested that the primary motivation for the White House to switch from Lotus Notes over to Microsoft Outlookwas familiarity with Microsoft's offerings. Unfortunately, that also meant abandoning a fully-operational automated email archive system, fora manual do-it-yourself approach copying PST files from journal folders.
Familiarity also explains why other government employees might print out their emails and archive them on paperin filing cabinets. They are familiar with this process, it allows them to treat email in the same manner as they have treated paper documents in the past.
Cost, Control and Unique Requirements
The last category of reasons can often result if what you want is smaller or bigger than what is availablecommercially. There are minimum entry-points for many vendors. If you want something so small that it is notprofitable, you may end up doing it yourself. On the other end of the scale, both Yahoo and Google ended up building their data centers with a do-it-yourself approach, because no commercial solutions were available atthe time. (IBM now offers [iDataPlex], so that has changed!)
While you could hire a vendor to build a customized solution to meet your unique requirements, it might turn outto be less costly to do-it-yourself. This might also provide some added control over the technologies and components employed. However, as EMC blogger Chuck Hollis correctly pointed out for[Do-it-yourself storage],your solution may not be less costly than existingoff-the-shelf solutions from existing storage vendors, when you factor in scalability and support costs.
Of course, this all assumes that storage admins building the do-it-yourself storage have enough spare time to do so. When was the last time your storage admins had spare time of any kind?Will your storage admins provide the 24x7 support you could get from established storage vendors? Will theybe able to fix the problem fast enough to keep your business running?
From this, I would gather that if you have storage admins more familiar with Solaris than Linux, Windows or MacOS,and select commodity x86 servers from IBM, Sun, HP, or Dell, they could build a solution that has less vendor lock-in than something off-the-shelf from Sun. Let's explore the fears of vendor lock-in further.
The storage vendor goes out of business
Sun has not been doing so well, so perhaps "open storage" was a way to warn existing Sun storage customers thatbuilding your own may be the next alternative.The New York Times title of their article says it all:["Sun Microsystems Posts Loss and Plans to Reduce Jobs"]. Sun is a big company, so I don't expect them to close their doors entirely this year,but certainly fear of being locked-in to any storage vendor's solution gets worse if you fear the vendor might go out of business.
The storage vendor will get acquired by a vendor you don't like
We've seen this before. You don't like vendor A, so you buy kit from vendor B, only to have vendor A acquire vendorB after your purchase. Surprise!
The storage vendor will not support new applications, operating systems, or other new equipment
Here the fear is that the decisions you make today might prevent you from choices you want to make in the future.You might want to upgrade to the latest level of your operating system, but your storage vendor doesn't supportit yet. Or maybe you want to upgrade your SAN to a faster bandwidth speed, like 8 Gbps, but your storage vendordoesn't support it yet. Or perhaps that change would require re-writing lots of scripts using the existingcommand line interfaces (CLI). Or perhaps your admins would require new training for the new configuration.
The storage vendor will raise prices or charge you more than you expect on follow-on upgrades
For most monolithic storage arrays, adding additional disk capacity means buying it from the same vendor as the controller. I heard of one company recently who tried to order entry-level disk expansion drawer, at a lower price, solely to move the individual disk drives into a higher-end disk system. Guess what? It didn't work. Most storage vendors would not support such mixed configurations.
If you are going to purchase additional storage capacity to an existing disk system, it should cost no more thanthe capacity price rate of your original purchase. IBM offers upgrades at the going market rate, but not all competitors are this nice. Some take advantage of the vendor lock-in, charging more for upgrades and pocketing the difference as profit.
Vendor lock-in represents the obstacles in switching vendors in the event the vendor goes out of business, failsto support new software or hardware in the data center, or charges more than you are comfortable with. These obstacles can make it difficult to switch storage vendors, upgrade your applications, or meet otherbusiness obligations. IBM SANVolume Controller and TotalStorage Productivity Center can help reduce or eliminate many of these concerns. IBMGlobal Services can help you, as much or as little, as you want in this transformation. Here are the four levelsof the do-it-yourself continuum:
Let me figure it out myself
Tell me what to do
Help me do it
Do it for me
This is the self-service approach. Go to our website, download an [IBM Redbook], figure out whatyou need, and order the parts to do-it-yourself.
IBM Global Business Services can help understand your business requirementsand tell you what you need to meet them.
IBM Global Technology Services can help design, assemble and deploy asolution, working with your staff to ensure skill and knowledge transfer.
IBM Managed Storage Services can manage your storage, on-site at your location, or at an IBM facility. IBM provides a varietyof cloud computing and managed hosting services.
So, if you are currently a Sun server or storage customer concerned about these latest Sun announcements, give IBM a call, we'll help you switch over!
He feels I was unfair to accuse EMC of "proprietary interfaces" without spelling out what I was referring to. Here arejust two, along with the whines we hear from customers that relate to them.
EMC Powerpath multipathing driver
Typical whine: "I just paid a gazillion dollars to renew my annual EMC Powerpath license, so you will have to come back in 12 months with your SVC proposal. I just can't see explaining to my boss that an SVC eliminates the need for EMC Powerpath, throwing away all the good money we just spent on it, or to explain that EMC chooses not to support SVC as one of Powerpath's many supported devices."
EMC SRDF command line interface
Typical whine: "My storage admins have written tons of scripts that all invoke EMC SRDF command line interfacesto manage my disk mirroring environment, and I would hate for them to re-write this to use IBM's (also proprietary) command line interfaces instead."
Certainly BarryB is correct that IBM still has a few remaining "proprietary" items of its own. IBM has been in business over 80 years, but it was only the last 10-15 years that IBM made a strategic shift away from proprietary and over to open standards and interfaces. The transformation to "openness" is not yet complete, but we have made great progress. Take these examples:
The System z mainframe - IBM had opened the interfaces so that both Amdahl and Fujitsu made compatible machines.Unlike Apple which forbids cloning of this nature, IBM is now the single source for mainframes because the other twocompetitors could not keep up with IBM's progress and advancements in technology.
Update: Due to legal reasons, the statements referring to Hercules and other S/390 emulators havebeen removed.
The z/OS operating system - While it is possible to run Linux on the mainframe, most people associate the z/OSoperating system with the mainframe. This was opened up with UNIX System Services to satisfy requests from variousgovernments. It is now a full-fledged UNIX operating system, recognized by the [Open Group] that certifies it as such.
As BarryB alludes, the unique interfaces for disk attachment to System z known as Count-Key-Data (CKD) was published so that both EMC and HDS can offer disk systems to compete with IBM's high-end disk offerings. Linux on System zsupports standard Fibre Channel, allowing you to attach an IBM SVC and anyone's storage. Both z/OS and Linux on System z support NAS storage, so IBM N series, NetApp, even EMC Celerra could be used in that case.
The System i itself is still proprietary, but recently IBM announced that it will now support standard block size (512 bytes) instead of the awkward 528 byte blocks that only IBM and EMC support today. That means that any storage vendor will be ableto sell disk to the System i environment.
Advanced copy services, like FlashCopy and Metro Mirror, are as proprietary as the similar offerings from EMCand HDS, with the exception that IBM has licensed them to both EMC and HDS. Thanks to cross-licensing, you can do [FlashCopy on EMC] equipment. Getting all the storage vendors to agree to open standards for these copy services is still workin progress under [SNIA], but at least people who have coded z/OS JCL batchjobs that invoke FlashCopy utilities can work the same between IBM and EMC equipment.
So for those out there who thought that my comment about EMC's proprietary interfaces in any way implied thatIBM did not have any of its own, the proverbial ["pot calling the kettle black"] so to speak, I apologize.
BarryB shows off his [PhotoShop skills] with the graphic below. I take it as a compliment to be compared to anAll-American icon of business success.
TonyP and Monopoly's Mr. Pennybags Separated at Birth?
However, BarryB meant it as a reference back to long time ago when IBMwas a monopoly of the IT industry, which according to [IBM's History], ended in 1973. In other words, IBMstopped being a monopoly before EMC ever existed as a company, and long before I started working for IBM myself.
The anti-trust lawsuit that BarryB mentions happened in 1969, which forced IBM to separate some of the software from its hardware offerings, and prevented IBM from making various acquisitions for years to follow, forcing IBM instead into technology partnerships. I'm glad that's all behind us now!
Continuing my week's theme on how bad things can get following the "Do-it-yourself" plan, I start with James Rogers' piece in Byte and Switch, titled[Washington Gets E-Discovery Wakeup Call]. Here's an excerpt:
"A court filing today reveals there may be gaps in the backup tapes the White House IT shop used to store email. It appears that messages from the crucial early stages of the Iraq War, between March 1 and May 22, 2003, can't be found on tape. So, far from exonerating the White House staffers, the latest turn of events casts an even harsher light on their email policies.
Things are not exactly perfect elsewhere in the federal government, either. A recent [report from the Government Accountability Office (GAO)] identified glaring holes in agencies’ antiquated email preservation techniques. Case in point: printing out emails and storing them in physical files."
You might think that laws requiring email archives are fairly recent. For corporations, they began with laws like Sarbanes-Oxley that the second President Bush signed into law back in 2002. However, it appears that laws for US Presidents to keep their emails were in force since 1993, back when the first President Clinton was in office. (we might as all get used to saying this in case we have a "second" President Clinton next January!)
"The Federal Record Act requires the head of each federal agency to ensure that documents related to that agency's official business be preserved for federal archives. The Watergate-era Presidential Records Act augmented the FRA framework by specifically requiring the president to preserve documents related to the performance of his official duties. A [1993 court decision] held that these laws applied to electronic records, including e-mails, which means that the president has an obligation to ensure that the e-mails of senior executive branch officials are preserved.
In 1994, the Clinton administration reacted to the previous year's court decision by rolling out an automated e-mail-archiving system to work with the Lotus-Notes-based e-mail software that was in use at the time. The system automatically categorized e-mails based on the requirements of the FRA and PRA, and it included safeguards to ensure that e-mails were not deliberately or unintentionally altered or deleted.
When the Bush administration took office, it decided to replace the Lotus Notes-based e-mail system used under the Clinton Administration with Microsoft Outlook and Exchange. The transition broke compatibility with the old archiving system, and the White House IT shop did not immediately have a new one to put in its place.
Instead, the White House has instituted a comically primitive system called "journaling," in which (to quote from a [recent Congressional report]) "a White House staffer or contractor would collect from a 'journal' e-mail folder in the Microsoft Exchange system copies of e-mails sent and received by White House employees." These would be manually named and saved as ".pst" files on White House servers.
One of the more vocal critics of the White House's e-mail-retention policies is Steven McDevitt, who was a senior official in the White House IT shop from September 2002 until he left in disgust in October 2006. He points out what would be obvious to anyone with IT experience: the system wasn't especially reliable or tamper-proof."
So we have White House staffers manually creating PST files, and other government agencies printing out their emails and storing them in file cabinets. When I first started at IBM in 1986, before Notes or Exchange existed, we used PROFS on VM on the mainframe, and some of my colleagues printed out their emails and filed them in cabinets. I can understand how government employees, who might have grown up using mainframe systems like PROFS, might have just continued the practice when they switched to Personal Computers.
Perhaps the new incoming White House staff hired by George W. Bush were more familiar with Outlook and Exchange, and ratherthan learning to use IBM Lotus Notes and Domino, found it easier just to switch over. I am not going to debatethe pros and cons of "Lotus Notes/Domino" versus "Microsoft Outlook/Exchange" as IBM has automated email archiving systems that work great for both of these, as well as also for Novell Groupwise. So, taking the benefit of the doubt,when President Bush took over, he tossed out the previous administration's staff, and brought in his own people, andlet them choose the office productivity tools they were most comfortable with.Fair enough, happens every time a new President takes office. No big surprise there.
However, doing this without a clear plan on how to continue to comply with the email archive laws already on the books, and that it continues to be bad several years later, is appalling. I can understand why business are upset in deploying mandated archiving solutions when their own government doesn't have similar automation in place.
I had a great weekend, participating in this year's ["World Laughter Day"] yesterday, and preparingfor tonight's festivities, found me pulling out the various packages from "Simply Dinners" from my freezer.
A Tucson-based company, [Simply Dinners] offers an alternative to restaurant eating.My sister went there, assembled a set of freezer-proof plastic bags containingall the right ingredients based on specific recipes, and gave them to me for my birthday, and they have been sitting in my freezer ever since... until last weekend.
My sister was careful to choose items that fit my [Paleolithic Diet] that my nutritionist has me on. However, I was skepticalthat any plastic bag full of frozen groceries would be any better than anything I could assemble on my own.I did, after all, attend "chef school" and do know how to cook well. Each package was intended to be a "dinner for two" but since I am single, was two meals each for me.
So, I decided to try them out, which would also give me more room in my freezer for incoming items, and theycame out very well. The outside of each plastic bag was a label that explained all the steps required to heatthe food. Partially-cooked vegetables were wrapped in foil, and went in for the last 10 minutes of cooking the meat.The process was straightforward, and the meals were delicious, but nothing I could not have done on my own witha recipe and a trip to the grocery store.
The question is whether someone with little or no skills could achieve similar, or acceptable results. I havefriends who are limited to assembling sandwiches from luncheon meats and cheese slices, as anything involvingheat other than simply boiling water is beyond their skills.
The key difference between "cooking for yourself" and "building your own storage" is that you aren't buildingstorage for just yourself. Unless you are a one-person SMB company, you are building storage that all of youremployees and managers count on to do their jobs, and by extension your customers and stockholders count on.
Of course I had to read responses from others before jumping in with my thoughts.Dave Raffo from Storage Soup writes [Sun going down in storage],feeling this is yet another indication that Sun has lost their mind, recounting previous events that supportthat theory.EMC blogger Mark Twomey in his StorageZilla posts [When Open Isn't] felt a littlebit guilty kicking a competitor when down. EMC blogger Chuck Hollis questions the reasons peoplemight be tempted to even try this in his post [Do-it-Yourself Storage]. Here'san excerpt:
I really, really struggle with this concept, I do. Here's why:
Anything I use and get comfortable with -- well, I'm "locked in" to a certain degree. If I use a lot of storage software X; well, I'm sorta locked in, aren't I? Or, if I put my servers-as-storage on a three-year lease, I'm kind of locked in, aren't I?"
(For EMC, vendor lock-in is great when customers are using and comfortable with EMC products, and awful when they use andare comfortable with storage from someone else. But nobody who is "comfortable" with what they have ever complain about"vendor lock-in" do they? It's the ones who are growing uncomfortable and feel trapped in changing. Howinvolved a company's use of EMC's proprietary interfaces are can greatly determine the obstacles in switching toa different vendor.Of course, if you count yourself as someone growing uncomfortable with your existing storage vendor, IBM can help you fix that problem, but that is a subject for another post.)
Worried about "vendor lock-in"? Try "admin lock-in" where you must keep a storage admin around because he or shewas the one that put your storage together. I've seen several companies held hostage by their system adminsfor home-grown scripts that serve as "duct tape for the enterprise".The other issue is whether you have storage admins who have the necessary hardware and software engineering skillsto put suitable storage together. There are some very smart storage admins I know who could, and others that wouldhave a difficult time with this.
No doubt this is promising for the home office. I myself have taken several PCs that were running older versions of Windows,but not powerful enough to upgrade to Windows Vista, wiped them clean, loaded Linux, and configured them from everythingfrom simple browser workstations to full LAMP application server configurations. While this might sound easy, I am a professional hardwareand software engineer with Linux skills.I have no doubt that someone with sufficient engineering and Solaris skills could put together a storage system for home use.
One area where Sun definitely benefits from this "Open Storage" approach is to develop Solaris skills. I have no personal experience with OpenSolaris, but assume that if you learn it, you would be able to switch overto full Solaris quite easily.Today, most people have Windows, Linux and/or MacOS skills coming into the workforce, and this could be Sun's way of getting new fresh faces who understand Solaris commands to replace retiring "baby boomers". The lack of Solaris-knowledgeable admins is perhaps one reason why companies are switching to IBM AIX, Linux or Windows in theirdata center.
Certainly, IBM's strategic choice to support Linuxhas been a great success. People learn Linux on their home systems, and at school, and are able to carry those skillsto Linux running on everything from the smallest IBM blade server to IBM's biggest mainframe.
The videos on Sun for the "recipes" on how to put together various "storage configurations in ten minutes" appear simplerthan last summer's "How to hack an Apple iPhone to switch away from AT&T" procedures.
Continuing this week's theme on "best of breed", some questions arise: How is this calculated or determined?How is one storage solution "better" than another? Which attributes weigh more heavily in the decision?
Some attributes are directly measurable, like storage performance. For this, gather up a list ofall the storage products you are interested in, go to the [Storage Performance Council website],determine whether SPC-1 or SPC-2more closely matches your application workload, and then choose the best product fromthe benchmarks, discarding any vendors that don't bother to have benchmarks posted.The new SPC-2 benchmark was created, in part, to address new workloads for the Media and Entertainmentindustry. (For a comparison of the two, see my post [SPC benchmarks for Disk System Performance])
However, other attributes, like "easy to manage", are not as straightforward to measure.One client compared the complexity of different solutions by counting the number of cables involved to connect the various parts of each solution. Only external cables were considered. All of the cables inside an IBM SystemStorage DS8000 would not be counted. By this measure, a single IBM System z10 EC mainframe connected to a single IBM DS8000 disk system over a few FICON cables would therefore be "less complicated" than a thousand x86 servers connectedvia FCP SAN switches to dozens of disk systems.
But counting cables only handles the hardware part of the interconnections. You have to also considerthe interconnections between the software, between users, and between IT administrators. It is not alwaysobvious where those connections are, and how to count them into consideration.
This month, IBM introduced the first "Management Complexity Factor" (MCF) for the Media andEntertainment industry. IBM MCF a result of IBM's acquisition of NovusCG, and is an essential part of"Storage Optimization Services" being offered by IBM. Here is an excerpt from the[IBM Press Release]:
"Media companies are facing a double-edged sword with the exponential rise in digital media storage needs, coupled with concerns about optimizing storage to be more efficient," said Steve Canepa, vice president of Media and Entertainment, IBM. "By quickly and cost-effectively analyzing the interconnected IT and storage environments that increasingly comprise media operations, MCF for Media helps our clients identify opportunities for improvement and align their IT and business strategies."
Since 1995, IBM has invested more than $18 billion on public acquisitions, making it the most acquisitive company in the technology industry, based on volume of transactions.
IBM has a strong global focus on the media and entertainment industry across all of its services and products, serving all the major industry segments -- entertainment, publishing, information providers, media networks and advertising.
In his post on Rough Type titled ["McKinsey surveys the new software landscape"], Nick Carr discusses the growing acceptance in the marketplace for Software-as-a-Service, or SaaS.He summarizes the results of McKinsey's recent[Enterprise Software Customer Survey 2008].IBM is already well established as part of the Web 2.0 Big "5" (the other four are Google, Yahoo, Amazon and Microsoft), so it may not be much surprise that it introduced some new offerings focused on this emerging market.
For managed hosting, [IBM Managed Storage Services] hasbeen extended to support archive data through its entire lifecycle: supporting access, migration, non-erasablenon-rewriteable (NENR) protection, and expiration/destruction. This offering supports locating the storage onthe customer premises, a hosting center, or an IBM Service Deliver Center. IBM's blended disk and tape approachprovides a better alignment between information value and storage costs.
Last December IBM acquired Arsenal Digital, which offers a remote "Enterprise Email Archive" service, supporting retention policies that can apply per user,per group, or even my message, as needed. This service provides fast user access to email archives, as well as e-discovery search. The search is not just for the email body text, but supports over 370different attachment types as well. Deduplication technology is used to reduce the actual amount of storage needed by 80percent. All of this with the security and comfort of knowing that these email archives are encrypted and protected in a disaster recovery class datacenter managed by IBM.Blocks and Files presents their thoughts on this in the article["IBM storing data and mail in the cloud"].
The Radicati Group has published some interesting statistics about email archive in[Volume 4, Issue 3]. Here's an excerpt:
"In 2007, a typical corporate email account receives about 18 MB of data per day. This number is expected to grow to over 28 MB by 2011. Today, there is no way to effectively manage these messages, but with the help of an archiving solution.
Today, the worldwide percentage of corporate mailboxes protected by archiving solutions is estimated to be around 14%, however it is growing at a fast pace, and is expected to reach over 70% by 2011.
A survey of 102 corporate organizations worldwide, showed that 68% of large businesses view compliance as their top security concern in 2007."
For those who are actually providing these services to others, over the cloud, then you might want to use the new[IBM System x iDataPlex].Compared to traditional server environments, the iDataPlex provides five times the computing power by doubling the number of servers per rack, but with 40 percent less energy consumption. Thanks to clevercooling technology, the system can run in standard office "room temperature" environments. You cancustomize with a mix of compute, network and storage nodes to meet your application requirements.In addition to Web 2.0 and SaaS workloads, the iDataPlex can be useful for financial risk analysis,high performance computing, and even batch processing.
Whether you are looking to contract out for SaaS, or to provide a service to others over the cloud, IBM can help!
"Our survey data shows that over the past 12 months, more firms have bought their storage from a single vendor. While this might not be for everyone, it's worth serious consideration for your environment. Maybe you won't get the best price per gigabyte every time, but you'll probably save money in the long run because of simpler management, increased staff specialization, increased capacity utilization, and better customer service."
A Forrester survey of 170 companies ranging from SMBs to large enterprises in North America and Europe found that more than 80 percent bought their primary storage from one vendor over the last year. That includes 64 percent of the companies with more than 500 TB of raw storage.
The report, written by analyst Andrew Reichman, says using more than one primary storage vendor can make it more complex to manage, provision and support the storage environment. And while using multiple vendors can often bring better pricing, buying from one vendor can result in volume discounts.
“You may have tried to contain costs by forcing multiple incumbent vendors to continuously compete against each other, with price as the primary differentiator,” Reichman writes. “This strategy can reduce prices and limit vendor lock-in, but it can also lead to management complexity and poor capacity utilization.”
The report recommends keeping things simple by and using fewer vendors when possible. However, that advice comes with several caveats: buying all storage from one vendor means taking the bad with the good, and some vendors’ product families differ so much “they may as well come from different vendors.”
As if by coincidence, fellow blogger from EMC Chuck Hollis gives his reflections on this same topic. Here's an excerpt:
When it comes to buying storage (or any infrastructure technology, for that matter), there seem to be two camps:
Best-of-breed (i.e. multivendor): -- buy what's best, get the best price, keep all the vendors on their toes, etc. etc.
Single vendor: primarily use one vendor's offerings, and hold them accountable for the outcome.
If Chuck had said "multivendor" versus "single vendor", then that would have been a true dichotomy, but interestinglyhe equates best-of-breed with a multivendor approach. Let's consider two examples:
Disk from one vendor, Tape from another
Here is a multivendor strategy, and if you have a clear idea of what best-of-breed means to you, then you couldpick the best disk in the market, and the best tape in the market. However, I don't think this keeps either vendor"on their toes", or helps you negotiate lower prices by threatening to switch to the other vendor. In shops likethis, the staffing usually matches, so there are disk administration and tape operations, with little or no overlap, andlittle or no interest in retraining to use a new set of gear. It is true that disk-based VTL could be used where real tape libraries are used, but this may not be enough to threaten your existing vendors that you will switch all your disk to tape, or all your tape to disk.
One could argue that the vendor that sells the besttape could be the exact same vendor that sells the best disk. In this case, your multivendor strategy would actuallywork against you, forcing you away from one of your best-of-breed choices.
Disk and Tape from one vendor for some workloads, Disk and Tape from another vendor for other workloads
Here is a different multivendor strategy. Having disk and tape for the same vendor allows you to take advantageof possible synergies. The IT staff knows how to use the products from both vendors. This strategy does let you keep your vendors "on their toes". You can legitimately threaten to shift your budget from one vendor over another.However, whatever your definition of best-of-breed is, chances are the product from one vendor is, and the other vendor is not. Both meet some lowest common denominator, meeting some minimum set of requirements, which would allow you to swap out one for the other.
I guess I look at it differently. The equipment in your data center should be thought of as a team. Do your servers, storage and software work well together?
While Americans like to celebrate the accomplishments of individual musicians, athletes or executives, it is actually bands that compete against other bands, sports teams that compete against other sport teams, and companies that compete against other companies. Teamwork in the data center is not just for the people who work there, but also for the IT equipment. Just as a new incoming athlete may not get along well with teammates, shiny new equipment may not get along with your existing gear. Conversely, your existing infrastructure may not let the talents or features of your new equipment shine through.
Putting together the best parts from different teams might serve as a great diversion for those who enjoy["fantasy football"], it may not be the best approach for the data center. Instead, focus on managing your data center as a team, perhaps with theuse of IBM TotalStorage Productivity Center to minimize the heterogeneity of your different equipment. Pick an ITvendor that sells "team players" for your servers, storage and software, with broad support for interoperability and compatibility.
The "Storage Symposium Mexico - 2008" conference was a great success this week!
Day 1 - The plan was for me to arrive for the Wednesday night reception. Eachattendee was given a copy of my latest book[Inside System Storage: Volume I] and I was planning to sign them. I thought perhaps we should have a "book signing" tablelike all of the other published authors have.
Things didn't go according to plan. Thunderstorms at the Mexico City airport forced our pilot to find an alternate airport. Nearby Acapulco airport was the logical choice, but was full from all the otherflights, so the plane ended up in a tiny town called McAllen, Texas. I did not arrive until the morning of Day 2,so ended up signing the books throughout Thursday and Friday, during breaks and meals, wherever they couldfind me!
Special thanks to fellow IBMer Ian Henderson who picked me up from the airport at such an awkward hour anddrive me all the way to Cuernavaca!
All of us, IBMers, Business Partners and clients alike, all donned black tee-shirtswith a white eightbar logo for a group photo with one of those "wide lens" cameras. While we werebeing assembled onto the bleachers, I took this quick snapshot of myself and some of the guys behind me.
I was original scheduled to be first to speak, but with my flight delays, was moved to a time slot after lunch.After a big Mexican lunch, the conference coordinators were afraid the attendees might fall asleep,a Mexican tradition called [siesta], so I wasinstructed to WAKE THEM UP! Fortunately, my topic was Information Lifecycle Management, a topicI am very passionate about, since my days working on DFSMS on the mainframe. With 30percent reduction in hardware capital expenditures, 30 percent reduction in operational costs, and typical payback periods between 15 to 24 months, the presentation got everyone's attention.
Of course, a lot happens outside of the formal meetings. We had a Japanese theme dinner, where we woreJapanese Hachimaki [headbands]with the eightbar logo. For those not familiar with Japanese culture, hachimaki are worn today not so much for the practical purpose to catch the perspiration but rather for mental stimulation to express one's determination. Some students wear hachimaki when they study to put themselves in the right spirit and frame of mind.
Shown here are presenters Mike Griese (Infrastructure Management with IBM TotalStorage Productivity Center),Dave Larimer (Backup and Storage Management with IBM Tivoli Storage Manager), myself, and John Hamano(Unified Storage with IBM System Storage N series).
Day 3 - Wrapping up the week, I presented two more times.
First, I covered IBM Disk Virtualization with IBM SAN Volume Controller. One interesting question was if the SAN Volume Controller could be made to looklike a Virtual Tape Library. I explained that this was never part of the original design, but that if you wantto combine SVC with a VTL into a combined disk-and-tape blended solution, consider using theIBM product called Scale-Out File Services[SoFS] which I covered in my post[Moredetails about IBM clustered scalable NAS].
During one of the breaks, I took a picture of the behind-the-scenes staff that put this together. They had created these huge blocks representing puzzle pieces, emphasizing how IBM is one of the few ITvendors that can bring all the pieces together for a complete solution.
Shown hereare Mike Griese (presenter), Cyntia Martinez, Claudia Aviles, Cesar Campos (IBM Business Unit Executive forSystem Storage in Mexico), and Claudia Lopez. Each day the staff wore matching shirts so that it was easyto find them.
Later, I covered Archive and Compliance Solutions to highlight our complete end-to-end set of solutions.When asked to compare and contrast the architectures of the IBM System Storage DR550 with EMC Centera, I explainedthat the DR550 optimizes the use of online disk access for the most recent data. For example, if you aregoing to keep data for 10 years, maybe you keep the most recent 12 months on disk, and the rest is moved,using policy-based automation, to a tape library for the remaining nine years. This means that the disk insidethe DR550 is always being used to read and write the most recent data, the data you are most likely to retrievefrom an archive system. Data older than a year is still accessible, but might take a minute or two for the tapelibrary robot to fetch.The EMC Centera, on the other hand, is a disk-only solution. It offers no option to move older data to tape,nor the option to spin-down the drives to conserve power. It fills up after the same 12 months or so, and then you get towatch it the remaining nine years, consuming electricity and heating your data center.
I don't know about you, butI have never seen anyone purposely put in "space heaters" into their data center, but certainly a full EMC Centeradoes little else. Both devices use SATA drives and support disk mirroring between locations, but IBM DR550 offers dual-parity RAID-6, and supports encryption of the data on both the disk and the tape in the DR550. EMC Centerastill uses only RAID-5, and has not yet, as far as I know, offered any level of encryption. IBM System StorageDR550 was clocked at about three times faster than Centera at ingesting new archive objects over a 1GbE Ethernet connection.
This last photo is me and fellow IBMer Adriana Mondragón. She was one of my students in the [System Storage Portfolio Top Gun class],last February in Guadalajara, Mexico.She graduated in the top 10 percent of her group, earning her the prestigious titleof "Top Gun" storage sales specialist.
The conference wrapped up with a Mexican lunch with a traditional Mariachi band. I took pictures, but figured you allalready know what [Mariachi players] look like, and I didn't wantto detract from the otherwise serious tone of this blog post! This was the first System Storage Symposium in Mexico, butbased on its success, we might continue these annually.
Last week's focus was on tape libraries, both virtual and real, leading up to our IBM announcement ofacquiring Diligent Technologies. I was focused on HDS blogger Hu Yoshida's post about his conversation with Mark,who was on an expert panel about these topics. Mark discovered that of the top energy consumersin his datacenter, his tape library was in the top five, a surprising result. Hu suggested that switching to a VTL with deduplicationtechnology was a potential alternative, and I pointed to a whitepaper from the Clipper Group that suggested otherwise.
My response was that perhaps Highmark's choice of backup software was poorly written, or that they had set it up with thewrong parameters, and just changing hardware might not be the right answer. I went too far given that I didn't know which software they had, which parameters theywere using, or which tape technology was involved. This came across wrong. I meant to poke fun at Hu's response.I did not mean to imply that Mark and his staff hadmade poor choices, or that they should automatically reject Hu's advice to consider other hardware alternatives.
I have discussed the situation with Mark, and agree that I should know his situation better before offeringsuggestions of my own.
And, it's not too late to sign up for IBM Tivoli's [Pulse 2008] conference that will be heldin Orlando, Florida, May 18-22, 2008. I'll be there Sunday and Monday only, in the Tivoli Storage track, so if you are planning to attend and wish to meet up with me while I am there, please send me a note!
[Earth Day] is celebrated in many countries on April 22, which marks the anniversary of the birth of the modern environmental movement in 1970. Others celebrate this on the March equinox.
IBM has finally aggregated everything that we are doing around "Green" initiatives onto a single[IBM Green] landing page. This has everything from IBM's own activities as well as what we sell to our clients.
Also, to mark this occasion, IBM held an internal contest for employees to make videos about Earth Day,the environment, and IT's role in making the situation better. The grand prize winner, and 10 secondprize winners, are available on this [IBM Green Contest - YouTube channel].Of these, I liked "New Life for Old Silicon" (shown here on the left).
IBM also developed [Power Up, the Game], which is theEarth Day Network's "official" game for today's festivities. It's a 3-D game created by IBM Research to help save a fictitious planet - the goal being to help students learn about ecology and climate change. This game is also hoped to motivate young students to get interested in math, scienceand technology.Eightbar has a great post [PowerUp - A serious game out inthe wild] discussing this.Here's also a 3-minute[the making of "Power Up, the game" video] to geta behind-the-scenes look.The game is a downloadable Windows client that then connects to the main servers to run.
You could buy 10 liters of gasoline in Venezuela with this coin.
I'm back from South America, and am now in Chicago, Illinois. I'm having breakfast at the Starbucksdowntown, and thought I would make a post before all of my meetings today.
On this trip, I met with IBM Business Partners and sales reps from Argentina, Colombia, Ecuador and Venezuela. While I have visited thefirst three countries on past trips, this was my first time to Caracas, Venezuela. I grew up in La Paz, Bolivia, and speak Spanish fluently, so had no problemgetting around and holding discussions with everyone. While my friends in the US are oftensurprised I speak multiple languages, it doesn't surprise anyone I visit in other countries.If you are going to have worldwide job responsibilities for a global company that does businessin over 180 countries, the least you could do is learn a few additional languages. I suspect themajority of the 350,000 IBM employees speak at least two languages, the exceptions being mostly the 50,000 orso employees that live in the United States.
I flew on American Airlines from Tucson to Dallas to Caracas, and was only slightly delayed as a resultof all of the flight cancellations that happened earlier that week. Some companies designate a single "official airline" for their employees to use. That makessense if all of your employees are located in a single city, and that city is the hub for yourdesignated airline.IBM is too big, too spread out, and sells technology to nearly every airline to make sucha designation. Instead, IBM tries to spread its business out to multiple carriers, although all ofmy colleagues seems to have their own personal favorites. Mine are American Airlines, Singapore Airlines and Cathay Pacific.
While other people were upset over the delays, I found American Airlines did a great job keeping me informed,and all their employees I talked to seemed to be handling the situation fairly well. If youfly on American, I recommend you sign up for "text message" notifications. I did this for everyleg of my trip, and was kept up to date on times, gates and status. Very helpful!American Airlines even started their own corporate blog: [AA Conversation] (Special thanks to my friend[Paul Gillen] for pointing this out)
(I read somewhere that if you are going to travel anywhere, you need to remember to bringboth your sunscreen and your sense of humor, otherwise you are going to get burned. Goodadvice! Trust me, you don't even know how bad it can really be until you travel in the third world.)
Anyhoo, last week, IBM Venezuela celebrated its 70th anniversary. That's right, IBM has been doingbusiness in Venezuela for the past 70 years. Also last week, IBM put out its impressive [1Q08 quarterly results],including 10 percent growth for IBM System Storage product line worldwide, comparing what IBM earned this first quarter to what IBM earned the first quarter of last year. For just the Latin American countries,the growth for IBM System Storage was 20 percent!There are a lot of oil and gas companies in Venezuela. With a barrel of oil selling at more than$117 US dollars, these companies are looking to spend their newly earned profits on IBM systems, software and services.
As for the picture above, that is a one-thousand Bolivares coin, worth about 47 US cents atthis week's official exchange rate. As with many Latin American countries going through [years of high inflation], Venezuela was tired of all those zeros on their money. For example, a cheeseburger, freedom fries and a Cokeat McDonald's would set you back 20,000 Bolivares.This year the Venezuelan governmentcreated a new currency called "Bolivares Fuertes" (VEF), lopping off the last three zeros.So, the coin above would be replaced by a new coin with a big "1" on it instead, and an old 2000 Bolivares billwould be replaced by a new 2 Bolivares Fuertes bill. Unfortunately,I had to give all my new Venezuelan money back at the airport upon leaving, but they let me keep the coinabove, since it is old money, as a souvenir so that I could use it as a ball mark for playing golf.
(The term Bolivares is named after Simon Bolivar who was born in Caracas. He is famous throughoutSouth America, and was, and I am not making this up, the first president of Colombia, the secondpresident of Venezuela, the first president of Bolivia, and the sixth president of Peru. Here isthe [Wikipedia article] to learn more.)
Gasoline costs a mere 100 old Bolivares per liter.For those who don't do metric, gasoline therefore costsless than 18 cents per gallon. By comparison, in the USA, the average today was $3.47 US dollarsper gallon, of which 18.4 cents of this is Federal tax. That's right, we pay more just in taxes forgasoline than los venezolanos pay for it all.
The side effect of cheap gas is bad traffic. Everybody in Venezuela drives their own car, and nobody thinksabout the price of gasoline, carpooling, or taking public transportation, acting much like Americans used to, up until a few years ago. With some of the gridlock we faced, it might have been faster (but not safer)to walk there instead.
Which makes me wonder if American Airlines fills up their airplanes with fuel at these lower prices when theypick up people in Caracas to take them back to the United States. In 2002, fuel represented 10 percentof the average airline's operating expenses, but today it is now 25 percent. That is a drastic increase!
The same is happening in data centers. In the past, electricity was so cheap, and such a small percentof the total IT budget, nobody gave it much thought. But as the usage of electricity increased, andthe cost per KWh went up, this has a multiplying effect, and the growth in power and cooling costs isgrowing four times faster than the average IT hardware budget increase.
Normally, IBM only makes announcements on Tuesdays, but today, Friday, IBM announces that it acquired Diligent Technologies. What? I got a lot ofquestions about this, so I thought I would start with this...
When I posted in January that[IBM Acquires XIV],fellow EMC blogger Mark Twomey of StorageZilla fame, sent me a comment:
"Ah now Tony I wasn't poking fun. Indeed I find it fascinating that Moshe who's been sitting out on the fringes for years having been banished for being an obstructionist to EMC entering the mid-market is now back.
Which reminds me what happens with Diligent? There his as well aren't they or has he packed his stake in that in?"
As you might have guessed, I am privy to a lot of stuff going on behind the scenes at IBM that I can't talk about in this blog, and all these rumors in the blogosphere about IBM acquisition of Diligent was a topic I couldn't officially recognize, defend or deny, until official IBM announcements were made.
In his latest post, Mark wonders about[the last Tape and Mainframe sales person on earth]. He recounts my interaction with fellow HDS blogger Hu Yoshia about the energy benefits ofVirtual Tape Libraries. Knowing that we were going to announcement IBM's acquisition of Diligent soon, I thoughtthis would be a worthy exchange, driving up the sales of Diligent boxes (whether you buy them from IBM or HDS).Diligent already had reselling arrangements with HDS, and IBM plans to continue thosearrangements going forward with HDS. As I have explained before in my post [Supermarketsand Specialty Shops], IBM and HDS cater to different customers, so if a customer who wants the best technologyfrom a specialty shop, they can buy IBM Diligent products from HDS, but if they want one-stop shopping, they can buyIBM Diligent directly from IBM or its other IBM Business Partners.
(Perhaps a more tricky situation is that Diligent also had an arrangement with Sun Microsystems, which competesdirectly against IBM as another IT supermarket vendor, but I have not heard how IBM has decided to handle thisgoing forward.)
For more on this intricate mess of interconnected companies, alliances and partnerships, read Dave Raffo's article[Data dedupe dance cardfilling up] over at Storage Soup.
So, let's tackle the first question:
Q1. What will happen to IBM's real tape library business?
Come on! IBM is Number one in tape, we've had virtual tape libraries since 1997 (the first in the industry)and continue to do well in both virtual and real tape libraries. Both provide value to the customer, and bothhave their place as part of the overall "information infrastructure". This acquisition provides yet another choicefor clients on our "supermarket" shelf.
(For those following the ["which is greener"] discussion, the robot of the IBM TS3500 real tape library consumes185W per frame (when moving) and each tape drive consumes 50W (when actively working on a tape). Compared to 13W per SATA disk drive, each 6-drive frame of a TS3500 consumes as much electricity as 37 SATA disk drives. If you are not running backups 24x7, the total KWh per day for your tape library is actually quite less, but as several people have pointed out, there are customers that do run backups 80-90 percent of the time. LTO-4 tapes can hold 800GB uncompressed, and SATA disk are now available in 1TB (1000 GB) size, so you can have fun with your own comparisons.)
Meanwhile, Scott Waterhouse, one of the few people at EMC who understand tape workloadslike backup and archive, takes me to task in his Backup Blog with his post[I want a Red Ferrari].For those who are surprised that anyone at EMC might understand backup workloads, EMC did acquire a company calledLegato, and perhaps Scott came from that acquisition. I've never met Scott in person, but based solely only fromhis writings, he seems to know his stuff and makes strong arguments for using IBM Tivoli Storage Manager (TSM) with deduplication and virtual tape libraries.
While TSM does a good job of "deduplicating" at the client first, backing up only changed data, Scott feels database and email repositories must be backed up entirely each time, which is what happens in many other backup software products. Some clients might have 80 percent database/email and only 20 percent files, while others might have less than 20 percent database/email and 80 percent files, so this might influence whether deduplication will have small or big benefit.If TSM has to backup the entire database, even though little has changed since the last backup, that is where deduplication on a virtual tape library can come in handy. For IBM DB2 and Oracle databases, IBM TSM application-aware Tivoli Data Protection module interface backs up only changed data, not the entire file. Thanks to IBM's FilesX acquisition-- (also coincidently from Israel) --IBM can extend this support now to SQL Server databases as well.However, to be fair, Scott is partly correct, TSM does backup some database and email repositories in their entirety, which is why it is a good idea to have BOTH an IBM virtual tape library with deduplication and Tivoli Storage Manager to handle all cases. This brings us to the next question:
Q2. What will happen to IBM's patented "progressive backup" technology?
IBM will continue to use TSM's progressive backup technology. TSM already works great with Diligent virtual tapelibraries. One example is LAN-free backup. In this configuration, the TSM client writes its backups directly toa virtual or real tape library, over the SAN, and then sends the list of files backed up to the TSM server over theLAN to record in its database. This can greatly reduce IP traffic on your LAN during peak backup periods. For more about this, see the IBM Redbook titled["Get More Out of Your SAN with IBM Tivoli Storage Manager"].
Jon Toigo from DrunkenData asks[Did IBM Do Due Diligence Before Making Diligent Acquisition a Done Deal?] which is probably always a valid question. Unlike XIV, I wasn't part of the Diligent acquisition team, so I can't provide first hand account of the process. I am told that the IBM team did all the right things to make sure everything is going to turn out right.Sadly, many companies that make acquisitions in the IT industry fail to make them work. Fortunately, IBM is one of the few companies that has a great success record, with over 60 acquisitions in the past six years.In the Xconomy forum, Wade Rousch writes[IBM and the Art of Acquisitions]and gives some insight why IBM is different. Jon did not understand why Cindy Grossman, IBM VP of tape and archive solutions, ran the analyst conference call for this announcement, which brings me to the next question:
Q3. What is Diligent virtual tape library going to be categorized as, a disk system or a tape system?
IBM organizes its storage systems based on the host application workloads.Products to address disk workloads (SVC, DS8000 series, DS6000 series, DS4000 series, DS3000 series, N series, XIV Nextra) are in our disk systems group. Storage that appears to host applications like a tape system to address workloads like backup and archive (tape drives, libraries and tape virtualization) are in our tape and archive group. IBM Diligent has two products, one for big workloads and one for medium workloads. Both look liketape systems, so our tape and archive team, who understand tape workloads like backup and archive the best, are obviously the best choice to support IBM Diligent in the mix.
IBM will offer both N series and Diligent deduplication capabilities. For disk workloads, IBM N series offers a post-process deduplication feature at no additional charge. For tape workloads, IBM will now offer an in-line deduplication feature with Diligent Technologies. Different workloads, different offerings.
As with any acquisition, there will be some changes. The 100 folks from Diligent will get to learn the IBM wayof doing things. This brings me to our fifth and final question:
Q5. What is the correct spelling: deduplication or de-duplication?
It appears that Diligent has a corporate-wide standard to hyphenate this term (de-duplication), but the "word police" at IBM that control and standardize all "proper spellings, trademarks, and capitalization" have sent me corporate instructions a few days ago that IBM does not to hyphenate this term (deduplication). So, going forward, it will be "deduplication", or "dedupe" for short.I suspect one of the first tasks that our new IBMers from Diligent will be doing is removing all those hyphens fromthe [Diligent Technologies website]!
That's all for now, I'm off to Chicago, Illinois tomorrow!
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization and vendor lock-in.
HDS is a major vendor for disk storage virtualization, and Hu Yoshida has been around for a while, so I felt it was fair to disagree with some of the generalizations he made to set the record straight. He's been more careful ever since.
However, his latest post [The Greening of IT: Oxymoron or Journey to a New Reality] mentions an expert panel at SNW that includedMark O’Gara Vice President of Infrastructure Management at Highmark. I was not at the SNW conference last week in Orlando, so I will just give the excerpt from Hu's account of what happened:
"Later I had the opportunity to have lunch with Mark O’Gara. Mark is a West Point graduate so he takes a very disciplined approach to addressing the greening of IT. He emphasized the need for measurements and setting targets. When he started out he did an analysis of power consumption based on vendor specifications and came up with a number of 513 KW for his data center infrastructure....
The physical measurements showed that the biggest consumers of power were in order: Business Intelligence Servers, SAN Storage, Robotic tape Library, and Virtual tape servers....
Another surprise may be that tape libraries are such large consumers of power. Since tape is not spinning most of the time they should consume much less power than spinning disk - right? Apparently not if they are sitting in a robotic tape library with a lot of mechanical moving parts and tape drives that have to accelerate and decelerate at tremendous speeds. A Virtual Tape Library with de-duplication factor of 25:1 and large capacity disks may draw significantly less power than a robotic tape library for a given amount of capacity.
Obviously, I know better than to sip coffee whenever reading Hu's blog. I am down here in South America this week, the coffee is very hot and very delicious, so I am glad I didn't waste any on my laptop screen this time, especially reading that last sentence!
In that report, a 5-year comparison found that a repository based on SATA disk was 23 times more expensive overall, and consumed 290 times more energy, than a tape library based on LTO-4 tape technology. The analysts even considered a disk-based Virtual Tape Library (VTL). Focusing just on backups, at a 20:1 deduplication ratio, the VTL solution was still 5 times per expensive than the tape library. If you use the 25:1 ratio that Hu Yoshida mentions in his post above, that would still be 4 times more than a tape library.
I am not disputing Mark O'Gara's disciplined approach. It is possible that Highmark is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library.In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
(Update: My apologies to Mark and his colleagues at Highmark. The above paragraph implied that Mark was using badproducts or configured them incorrectly, and was inappropriate. Mark, my full apology [here])
If you do decide to go with a Virtual Tape Library, for reasons other than energy consumption, doesn't it make sense to buy it from a vendor that understands tape systems, rather than buying it from one that focuses on disk systems? Tape system vendors like IBM, HP or Sun understand tape workloads as well as related backup and archive software, and can provide better guidance and recommendations based on years of experience. Asking advice abouttape systems, including Virtual Tape Libraries, from a disk vendor is like asking for advice on different types of bread from your butcher, or advice about various cuts of meat at the bakery.
The butchers and bakers might give you answers, but it may not be the best advice.
Fellow blogger and cartoon writer Scott Adams writes in his Dilbert Blog posts about the [Monte Hall problem].Monte Hall was the host of the American game show Let's Make a Deal. Here is an excerpt:
"The set up is this. Game show host Monte Hall offers you three doors. One has a car behind it, which will be your prize if you guess that door. The other two doors have goats. In other words, you have a 1/3 chance of getting the car.
You pick a door, but before it is opened to reveal what is behind it, Monte opens one of the doors you did NOT choose, which he knows has a goat behind it. And he asks if you want to stick with your first choice or move to the other closed door. One of those two doors has a car behind it. Monte knows which one but you don’t."
Mathematically, on your initial choice of doors, you have a 1/3 chance of picking the car, and 2/3 chance ofpicking a goat. But, after you make a choice, Monte knows which door(s) have goats behind them, and selectsone that exposes the goat. If you stay with your initial choice, you still have a 1/3 chance that you win acar, but if you change your mind and choose the other door, your odds double, you have a 2/3 chance of winning.This is not obvious at all to most people, so Scott points people to the [Wikipedia entry] that provides the mathematicaldetails.
What does this have to do with storage?
When you pick a disk system, you are hoping you pick the door with the car. You want a disk system that meets your performance requirements for your particular workload and easy to deploy, configure and manage, with a low total cost of ownership for the three, four or five years you plan to use it.However, with over forty different storage vendors, there are some doors that might have goats. Some vendorshave only 90 day warranties for their software, and I don't know any customers that replace their disk systems that often.
(It was pointed out to me that it was unfair in my last week's post about[Xiotech'slow cost RAID brick], that I singled out EMC offering minuscule 90day warranties for the software needed to run their disk systems. I apologize. I have sincelearned that HDS and HP also shaft their clients with 90 day warranties. Apparently thereare a lot of vendors out there who lack confidence in the quality of their software!)
It would be nice if everyone published all of their performance benchmarks so that you canchoose the right door with the car behind it, but sadly in the storage industry, not everyoneparticipates with industry-standard benchmarks like the[Storage Performance Council].
In other cases, people make their choices based on past decisions. Perhaps someone beforethem chose one vendor over another, and it seems simple enough just to stay with the originalchoice. It is amazing how often people stay with their company's original choice, what we call in the industry the "incumbent vendor", without exploring alternatives.
So, if you bought an EMC, HDS or HP disk system in the last 90 days, it's not too late for you.Tell your local IBM rep that you are afraid you picked the door with the goat, and that you want to change your mind, and choose the other door and go with IBM instead.
You will double your chances of being happier with your new choice!
Storage Networking World conference is over, and the buzz from the analysts appears to be focused onXiotech's low-cost RAID brick (LCRB) called Intelligent Storage Element, or ISE.
(Full disclosure: I work for IBM, not Xiotech, in case there weren't enough IBM references on this blog page to remindyou of that. I am writing this piece entirely from publicly available sources of information, and notfrom any internal working relationships between IBM and Xiotech. Xiotech is a member of the IBM BladeCenteralliance and our two companies collaborate together in that regard.)
Fellow blogger Jon Toigo in his DrunkenData blog posted [I’m Humming “ISE ISE Baby” this Week] and then a follow-up post[ISE Launches]. I looked up Xiotech's SPC-1benchmark numbers for the Emprise 5000 with both 73GB and 146GB drives, and at 8,202 IOPS per TB, does not seem to be as fast as IBM SAN VolumeControllers 11,354 IOPS per TB. Xiotech offers an impressive 5 year warranty (by comparison, IBM offers up to 4 years, and EMC I think is stillonly 90 days).Jon also wrote a review in [Enterprise Systems]that goes into more detail about the ISE.
Fellow blogger Robin Harris in his StorageMojo blog posted [SNW update - Xiotech’s ISE and the dilithium solution], feeling that Xiotech should win the "Best Announcement at SNW" prize. He points to the cool video on the[Xiotech website]. In that video, they claim 91,000 IOPS.Given that it took forty(40) 73GB drives (or 4 datapacs) in the previous example to get 8,202 IOPS for 1TB usable, I am guessing the 91,000 IOPS is probably 44 datapacs (440 drives) glommed together, representing 11TB usable.The ISE design appears very similar to the "data modules" used in IBM's XIV Nextra system.
Fellow blogger Mark Twomey from EMC in his StorageZilla blog posted[Xiotech: Industry second]correctly points out that Xiotech's 520-byte block (512 bytes plus extra for added integrity) was not the firstin the industry. Mark explains that EMC CLARiiON had this since the early 1990's, and implies in the title that this must have been the first in the industry, making Xiotech an industry second. Sorry Mark, both EMC and Xiotech were late to the game. IBM had been using 520-byte blocksize on its disk since 1980 with the System/38. This system morphed to the AS/400, and the blocksize was bumped up to 522 bytes in 1990, and is now called the System i, where the blocksize was bumped up yet again to 528 bytes in 2007.
While IBM was clever to do this, it actually means fewer choices for our System i clients, being only able to chooseexternal disk systems that explicitly support these non-standard blocksize values, such as the IBM System Storage DS8000and DS6000 series. (Yes, BarryB, IBM still sells the DS6000!) The DS6000 was specifically designed with the System i and smaller System z mainframes in mind, and in that niche does very well. Fortunately, as I mentioned in my February post [Getting off the island - the new i5/OS V6R1], IBM has now used virtualization, in the form of the VIOS logical partition, to allow i5/OS systems to attach to standard 512-byte block devices, greatly expanding the storage choices for our clients.
(Side note: SNW happens twice per year, so the challenge is having something new and fresh to talk about each time. While Andy Monshaw, General Manager of IBM System Storage, highlighted some of the many emerging technologies in his keynote address, IBM shipped on many of them prior to his last appearance in October 2007: thin provisioning in the IBM System Storage N series, deduplication in the IBM System Storage N series Advanced Single Instance Storage (A-SIS) feature, and Solid State Disk (SSD) drives in the IBM BladeCenter HS21-XM models. Of course, not everyone buys IBM gear the first day it is available, and IBM is not the only vendor to offer these technologies. My point is that for many people, these are still not yet deployed in their own data center, and so they are still in the future for them. However, since these IBM deliveries happened more than six months ago, they're old news in the eyes of the SNW attendees. While those who follow IBM closely would know that, others like[Britney Spears] may not.)
Back in the 1990s, when IBM was developing the IBM SAN Volume Controller (SVC), we generically called the managed disk arrays that were being virtualized by the SVC as "low-cost RAID brick" or LCRB. The IBM DS3400 is a good example of this. However, as we learned, SVC is not just for LCRB, it adds value in front of all kinds of disk systems, including the not-so-low-cost EMC DMX and IBM DS8000 disk systems. ISE might make a reasonable back-end managed disk device for IBM SVC to virtualize. This gives you the new cool features of Xiotech's ISE, with IBM SVC's faster performance, more robust functionality and advanced copy services.
Next week, I'll be in South America in meetings with IBM Business Partners and storage sales reps.
My colleague, Marissa Benekos, is on location with her video camera in Orlando, Florida for theComputerWorld [Storage Networking World] conference.
The IT specialists from the IBM booth were excited at David Bricker's debut on YouTube.Here's the rest of the gang in this [video].
Here's Andy Monshaw, General Manager of IBM System Storage and keynote speaker at this SNW event, summarizingIBM's "Information Infrastructure" strategy in 60 seconds in this [Youtube video].
This last video is Clod Barrera talking about the importance of security. Clod is an IBM Distinguished Engineerand Chief Technical Strategist for IBM System Storage product line. Here is his[Youtube video]
It looks like Marissa is having a lot of fun taking these videos at the event.More videos, as we get them, will be posted to the [IBM videos channel].
My IBM colleague Marissa Benekos brought her hand-held video camera to [Storage Networking World] conference in Orlando, Florida.I am not there, as I had a conflict with another conference going on here in Tucson, so am relyingon Marissa to feed me information to blog about.
In this segment, she interviews "booth babe" David Bricker. I've known David a long time,and if you are there at the conference, tell him I sent you to visit him at the IBM booth.
David Bricker shows off some of the IBM System Storage product line at SNWin this YouTube video (2 minutes)
Sadly, I can't be in two places at once. SNW is a great conference to attend!
Well, its Tuesday, and that means more IBM announcements!!!
Let's do a quick recap of what was announced for storage:
We now support 1000GB SATA-II drives in the DS4000 series. This is available for the DS4200 model 7V, DS4700, DS4800 as well as the expansion drawers EXP420 and EX810. When I asked our marketing team why we weren't going to say "1TB" like everyone else, they thought 1000GB sounds bigger. I guess I should not have asked that on April Fool's day. For more details, see the IBM press releases for the [DS4200/EXP420and DS4700/DS4800/EXP810].
IBM announced new machine code Release 1.4a for the The IBM Virtualization Engine™ TS7700 virtual tape library for our System z mainframe customers.Various features come with this new level of machine code. See the IBM [Press Release] for more details.
Load balancing across the grid
Host control over the copy of logical volumes on a cluster by cluster basis
Option to gracefully remove an individual cluster from an existing grid
Initial-state reset for TS7700 database for cluster cleanup
Option to upgrade single-cache to dual-cache configuration
Also announced were updates to the 7214 model 1U2. Technically this is not in the IBM System Storage product line,but instead is designed specifically for our System p server line. This is a "media drawer" that allows you to havetape on one side, and optical on the other, in a single enclosure. IBM announced that you can now have DAT160 80GBdrives that is read-write compatible with DAT72 and DDS4 drives, and half-high LTO-4 drives that can read LTO-2 media, and is read-write compatible with LTO-3 media.Read the IBM [Press Release] for details.
Finally, if you are in the United States, Canada or the Carribean, there is a special discount promotionfor tape libraries purchased before June 20, 2008. This includes IBM TS3100, TS3200, TS3310 and TS3500 libraries.See the [Promotion Details] for eligibility.
IBM has added capability to the IBM TotalStorage Productivity Center for Replication. A quick review of the differentoptions for this component.
base Replication (uni-directional from primary to disaster site)
Two-site replication (bi-directional, including failover and failback)
Three-site replication (site awareness for all the copy sessions between all three sites in all situations)
Productivity Center for Replication supported all these levels for DS8000, DS6000 and ESS 800 disk models, butfor SVC it only supported FlashCopy and Metro Mirror for the uni-directional base. IBM announced version 3.4 today that has added support for SVC for Global Mirror (asynchronous disk mirroring) and bi-directional failover/failback. This supports lets you have "practice volumes" that allow IT managers to perform "disaster recovery exercises" without disrupting production workloads.
Also, for the DS8000, there is support for the new Space Efficient FlashCopy and DynamicVolume Expansion features. Here is the IBM
The Productivity Center for Replication server can run on either a Windows/Linux-x86 server or a z/OS mainframe server.The Productivity Center for Replication on System z offers all the same new support for SVC and DS8000, as well asincorporated Basic HyperSwap capability that I mentioned in my post last February[DS8000 Enhancements for the IBM System z10 EC].
Here are the IBM press releases for the TotalStorage Productivity Center for Replication on[Windows/Linux-x86and System z] servers.
I'm at a Business Partner conference today, discussing these announcements and other topics, so need to go back to those festivities.
On StorageZilla, fellow blogger Mark Twomey introduces the latest entrant from EMC to the blogosphere,in his post [Polly Pearson's blog].
Although we share the same name, with the same exact spelling, I would like be the first to point out we are not related, at least as far as I know. Basing solely from her post[Welcome to my Blog - Part 1], sheis a year younger than I am, a lot better looking, majored in communications, and is not afraid to quit acrappy job for a much better job elsewhere. I on the other hand, majored in engineering, but agree wholeheartedly not to stick in a crappy situation. There is such a skills shortage out there in the IT industry,with a cap on U.S. [H-1B visas] at a paltry [65,000 this year]. If you don't like your IT job, you should be able toquit and find another one in the IT industry you are more passionate about.
On a similar theme, over at DrunkenData, Jon Toigo's latest post asks if you are[Feeling Insecure About Your Job?]ScoreLogix’s Job Security Index has fallen in the United States, with a sharp drop specifically for IT jobs. Jon points out that while it might be easy to point out that a number went up or down, it is far more difficultto explain why it did so. He gives a good piece of career advice:
Want to keep your job? Play by the rules of the front office: demonstrate the value of what you do for the company from the standpoint of cost-savings, risk reduction and process improvement. Make yourself indispensable. If they don’t appreciate you then, you need to move on. You will always be hiding in your cubical and sweating a pink slip ...
So shine bright. Be remarkable. It is not always easy to communicate your value in a technical position to cluelessnon-technical managers. Certainly, writing a blog helps. Within IBM, there are over 3500 bloggers. Most postwithin the safe confines behind the firewall, but manage to generate ideas, present valid arguments, and get theconversation rolling with the right set of people that might be difficult otherwise in a company like IBM of over350,000 employees scattered around the world. A few of us daringly blog in full public, and carry the conversationto our clients, prospects, analysts, journalists, Business Partners, and others within the IT industry.
So, Polly Pearson from EMC, although we have never met in person, I too welcome you to the blogosphere!
There is a difference between improving "energy efficiency" versus reducing "power consumption".
Let's consider the average 100 watt light bulb, of which 5 watts generate the desired feature (light), and 95 percent generated as undesired waste (heat). In this case, it would be 5 percent efficient. If you delivered a new light bulb that generated 3 watts of light for only 30 watts of energy, then you would have an offering that was more energy efficient (10 percent instead of 5 percent) and use 70 percent less power (30 watts instead of 100 watts). This new "dim bulb" would not be as bright as the original, but has other desirable energy qualities.
Nearly all of the output of data center equipment results in heat.In The Raised Floor blog [It's Too Darn Hot!], Will Runyon explains how IBM researcher Bruno Michel in Zurich has developed new ways to cool chips with water shot through thousands of nozzles, much like capillaries in the human body. This is just one of many developments that are part of IBM's [Project Big Green]
But what if the desired feature is heat, and the undesired feature is light?In the case of Hasbro's toy[Easy-Bake Oven],a 100W incadescent light bulb is used to bake small cakes. This is generating 95W of desired heat, and onlywasting 5 percent as light (unused inside the oven). That makes this little toy 95 percent energy efficient, butconsumes as much energy as any other 100W light bulb lamp or fixture in your house. With manufacturing switchingfrom incadescent to compact flourescent bulbs, this toy oven may not be around much longer.
While we all joke that it is just a matter of time before our employers make us ride stationary bicycles attached to generators to power our monstrous data centers, 23-year old student Daniel Sheridan designeda see-saw for kids in Africa to play on that generates electricity for nearby schools. [Dan won the "mostinnovative product" at the Enterprise Festival].
Another approach is to improve efficiency by converting previously undesirable outcomes to desirable. Brian Bergstein has a piece in Forbes titled["Heat From Data Center to Warm a Pool"].Here's an excerpt:
"In a few cases, the heat produced by the computers is used to warm nearby offices. In what appears to be a first, the town pool in Uitikon, Switzerland, outside Zurich, will be the beneficiary of the waste heat from a data center recently built by IBM Corp. (nyse: IBM) for GIB-Services AG.
As in all data centers, air conditioners will blast the computers with chilly air - to keep the machines from exceeding their optimum temperature of around 70 degrees - and pump hot air out.
Usually, the hot air is vented outdoors and wasted. In the Uitikon center, it will flow through heat exchangers to warm water that will be pumped into the nearby pool. The town covered the cost of some of the connecting equipment but will get to use the heat for free."
I see a business opportunity here. Next to every data center lamenting about their power and cooling, build a state-of-the-art fitness center for the employees and nearby townspeople. Exercise on a stationary bicyclegenerating electricity, while your kids play on the see-saw generating electricity, and then afterwards thewhole family can take a dip in the heated swimming pool. And if the company subscribes to the notion of a Results-Oriented Work Environment [ROWE],it could encourage its employees to take "fitness" breaks throughout the day, rather than having everyone there in the early morning or late evening hours, leveling out the energy generated.
In explaining the word "archive" we came up with two separate Japanese words. One was "katazukeru", and the other was "shimau". If you are clearing the dinner plates from the table after your meal, for example, it could be done for two reasons. Both words mean "to put away", but the motivation that drives this activity changes the word usage. The first reason, katazukeru, is because the table is important, you need the table to be empty or less cluttered to use it for something else, perhaps play some card game, work on arts and craft, or pay your bills. The second reason, shimau, is because the plates are important, perhaps they are your best tableware, used only for holidays or special occasions only, and you don't want to risk having them broken. As it turns out, IBM supports both senses of the word archive. We offer "space management" when the space on the table, (or disk or database), is more important, so older low-access data can be moved off to less expensive disk or tape. We also offer "data retention" where the data itself is valuable, and must be kept on WORM or non-erasable, non-rewriteable storage to meet business or government regulatory compliance.
The process of archiving your data from primary disk to alternate storage media can satisfy both motivations.
IBM offers software specifically to help with this archival process.For email archive, IBM offers [IBM CommonStore] for Lotus Domino and MicrosoftExchange. For database archive, including support for various ERP and CRM applications, IBM offers [IBM Optim] from the acquisition of Princeton Softech.
The problems occur when companies, under the excuse of simplification or consolidation, feel they can just usetheir backups as archives. They are taking daily backups of their email repositories and databases, and keepingthese for seven to ten years. But what happens when their legal e-discovery team needs to find all emails or database records related to a particular situation, an employee, client or account? Good luck! Most backupsare not indexed for this purpose, so storage admins are stuck restoring many different backups to temporary storage and combing through the files in hopes to find the right data.
Backups are intended for operational recovery of data that is lost or corrupted as a result of hardware failures, application defects, or human error. Disk mirroring or remote replication might help with hardware failures, but any logical deletion or corruption of data is immediately duplicated, so it is not a complete solution. FlashCopy or Snapshot point-in-time copies are useful to go back a short time to recover from logical failures, but since they are usually on the same hardware as the original copies, may not protect against hardware failures. And then there's tape, and while many people malign tape as a backup storage choice, 71 percent of customers send backups to tape, according to a 2007 Forrester Research report.
Backups often aren't viable unless restored to the same hardware platform, with the same operating system and application software to make sense of the ones and zeros. For this reason, people typically only keep two to five backup versions, for no more than 30 days, to support operational recovery scenarios. If you make updatesto your hardware, OS or application software, be sure to remember to take fresh new backups, as the old backupsmay no longer apply.
Archives are different. Often, these are copies that have been "hardened" or "fossilized" so that they make sense even if the original hardware, OS or application software is unavailable. They might be indexed so that they can be searched, so that you only have to retrieve exactly the data you are looking for. Finally, they are often stored with "rendering tools" that are able to display the data using your standard web browser, eliminating the need to have a fully working application environment.
Take any backup you might have from five years ago and try to retrieve the information. Can you do it? This might be a real eye-opener. You might have inherited this backup-as-also-archive approach from someone else, and are trying to figure out what to do differently that makes more sense. Call IBM, we can help.
Tim Ferris started the festivities with [The Grand Illusion: The Real Tim Ferriss speaks]. He claimed that for the past year, he outsourced the writing of his blog to a writer from India, and an editor from the Philippines. Given that his post was dated March 31, and he writes frequently about the benefits of outsourcing, it appeared like a legitimate post. However, Tim fessed up the following day, claiming that it was April 1 in Japan where he wrote it.
Guy Kawasaki wrote[April Fools' Stories You Shouldn't Believe]including my favorite #12 "Ruby on Rails cited Twitter as the centerpiece of its new 'Rails Can Scale' marketing program." Speaking of Twitter, Fellow IBM blogger Alan Lepofsky from our Lotus Notes team wrote[Great, now there is Twitter Spam]. It looked like a real post, but then I realized, ... everything on Twitter is spam!
Topics like energy consumption and global warming were fodder for posts and pranks.The post[Was Earth Hour a joke again?], argued thatthe preparation of "Earth Hour" last week in effect used up more energy than the hour of this annual "lights-off event" actually saved. This reminded me of John Tierney's piece in the New York Times ["How virtuous is Ed Begley, Jr.?"] where a scientist explains that it is more "green" for the environment to drive a car short distances than to walk:
If you walk 1.5 miles, Mr. Goodall calculates, and replace those calories by drinking about a cup of milk, the greenhouse emissions connected with that milk (like methane from the dairy farm and carbon dioxide from the delivery truck) are just about equal to the emissions from a typical car making the same trip. And if there were two of you making the trip, then the car would definitely be the more planet-friendly way to go.
Wayan Vota, my buddy over at OLPCnews, writes in his post[Windows XO Child Centric Development] that the "Sugar" operating environment on the innovative Linux-based XO laptops will soon be re-named the"Windows XO Operating System", with their new motto "Windows XO: A Child-Centric Operating Platform for Learning, Expression and Exploration." The mocked up photo of an XO laptop with the Windows XO logo was excellent!
The economists from Freakonomics explain in [And While You're at it, Toss the Nickel] that it costs the US Government 1.7 cents to produce each penny. The US government loses $50 million dollars each year making pennies. Each nickel costs 10 cents to produce. This one was dated March 31, so it could actually be true. Sad, but true.
My favorite, however, was EMC blogger Barry Burke's post["5773 > c"] explaining howtheir scientists were able to reduce latency on the EMC SRDF disk replication capability:
What the de-dupe team found is that there is a hidden feature within recent generations of this chip that allow a single bit, under certain circumstances, to represent TWO bits of information.
Still, almost 34% of the total bits transferred were in fact aligned double-zeros, far more than all other bit combinations - and most importantly, these were quite frequently byte-aligned, as required by this new-found capability. Makes sense, if you think about it - most of those 32- and 64-bit integers are used to store numbers that are relatively small (years, months, days, credit charges, account balances, etc.). So that's why the team decided to use this new two-fer bit to represent "00".
Mathematically, if you can transmit 34% of the data using half as many bits, you reduce the number of bits you have to transfer in total by 17%. Which, while not necessarily earth-shattering, is nothing to be ashamed of. On top of the SRDF performance enhancements delivered in 5772 (30% reduction in latency or 2x the distance), this new enhancement adds another 17% latency improvement (or ~1.4x more distance at the same latency). Combined with 5772, SRDF/S customers could see a 50% reduction in latency. And 5773 allows SRDF/A cycle times to be set below 5 seconds (with RPQ) - this new feature adds a little headroom to maximize bandwidth efficiency for the shortest possible RPO.
Again, this looked real, until I did the math. Start with the speed of light in a vacuum of space ("c" in BarryB's title) which is roughly 300,000 kilometers per second, or put into more understandable units, 300 kilometers per millisecond. However, light travels slower through all other materials, and for fiber optic glass it is only 200 kilometers per millisecond. Sending a block of data across 100km, and then getting a response back that it arrived safely, is a total round-trip distance of 200km, so roughly 1 millisecond. However, EMC SRDF often takes two or three round-trips per write, versus IBM Metro Mirror on the IBM System Storage DS8000 which has got this down to a single round-trip. The number of round-trips has a much bigger effect on latency than EMC's double-bit data compression technique. With IBM, you only experience about 1 millisecond latency per write for every 100km distance between locations, the shortest latency in the industry.
It is good that once a year, you should be skeptical of what you read in the blogosphere, and sometimes check the facts!
My father's favorite question is "What's the worst that could happen?" He is retired now, but workedat the famous [Kitt Peak National Observatory] designing some of the largesttelescopes. Designing telescopes followed well-established mechanical engineering best practices, but each design was unique,so there was always a chance that the end result would not deliver the expected results. What's the worst that can happen? For telescopes, a few billion dollars are wasted and a few years are added to the schedule. Scrap it and start over. Nothing unrecoverable for the US government with unlimited resources and patience.
... the rest of the grimness on the front page today will matter a bit, though, if two men pursuing a lawsuit in federal court in Hawaii turn out to be right. They think a giant particle accelerator that will begin smashing protons together outside Geneva this summer might produce a black hole or something else that will spell the end of the Earth — and maybe the universe.
Scientists say that is very unlikely — though they have done some checking just to make sure.
The world’s physicists have spent 14 years and $8 billion (US dollars) building the Large Hadron Collider, in which the colliding protons will recreate energies and conditions last seen a trillionth of a second after the Big Bang. Researchers will sift the debris from these primordial recreations for clues to the nature of mass and new forces and symmetries of nature.
But Walter L. Wagner and Luis Sancho contend that scientists at the European Center for Nuclear Research, or CERN, have played down the chances that the collider could produce, among other horrors, a tiny black hole, which, they say, could eat the Earth. Or it could spit out something called a “strangelet” that would convert our planet to a shrunken dense dead lump of something called “strange matter.” Their suit also says CERN has failed to provide an environmental impact statement as required under the National Environmental Policy Act.
Although it sounds bizarre, the case touches on a serious issue that has bothered scholars and scientists in recent years — namely how to estimate the risk of new groundbreaking experiments and who gets to decide whether or not to go ahead.
What's the worst that can happen? Scientists now agree that it is sometimes difficult to predict, and someeffects may be unrecoverable.
Unfortunately, this is not the only example of people attempting things they may not understand well enough. Theweb comic below has someone complaining they are out of disk space, and the sales rep suggests solving this with a few commands which will result in deleting all her files. Hopefully, most people reading will recognize this is meant as humor, and not actually attempt the code fragments to "see what they do".
This is a webcomic called "Geek and Poke". If you dare to read the punchline, click here: Funny Geeks - Part 5.
Warning: Do not try the code fragments unless you know what to expect!
Sadly, I often encounter clients who have a "keep forever" approach to their production data. When they are seriously out of space, they feel forced to either buy more disk storage, or start "the big Purge": deleting rows from their database tables, emails older than 90 days, or some other drastic measures. With a focus on keeping down IT budgets, I fear that thesedrastic measures are growing more common. What's the worst that could happen? You might need that data for defending yourself against a lawsuit, or need it to continue to provide service to a loyal client, or just continue normal business operations.I have visited companies where a junior administrator chose the "big Purge" option, without a full understanding ofwhat they were doing, resulting in business disruption until the data could be recovered or re-entered.
IBM offers a better way. Data that may not be needed on disk forever could be moved to lower-cost tape, using up less energy and less floorspace in your data center. Solutions can automatically delete the data systematically based on chronological or event-based retention policies, with the option to keep some data longer in response to a "legal hold" request.
That's certainly better than to risk shrinking your business into a "dense dead lump"!
I got some interesting queries about IBM's Scale-Out File Services [SoFS] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
As a consultant, I am often asked to help design the architecture for the information infrastructure. A usefulanalogy to gather requirements and preferences is the difference between area rugs and wall-to-wall carpeting. Arearugs are not secured to the floor and cover only a portion of the floor area. Carpets are generally tacked or cemented to the floor, often with an underlay of cushion padding, stretched across the entire floor surface, out to all four walls of each room.
Each has its pros and cons, and often is a matter of preference. Some people like area rugs because they can choosea different style for each room, match the decor and color scheme of furniture, and use these to define each livingspace. Ever since paleolithic man put animal skins on the floor of their cave, people recognize that cold, hard andugly floors could be covered up with something soft and more attractive.Others prefer wall-to-wall carpeting because they want to walk around the house barefoot, have their young children crawl on their hands and knees, and give the entire house a unified look and feel. This is often an inexpensive option when compared against the cost of individual rugs.
The same is true for an information infrastructure. For some, they prefer the "area rug" approach: this style ofstorage for their email, this other type of storage for their databases, and perhaps a third for their unstructuredfile systems. When customers ask what storage would I recommend for their SAP application, or their Microsoft Exchangeemail environment, or their Business Intelligence (BI) software, I recognize they are taking this "area rug" approach.
Like area rugs, having different storage can focus on specific attributes of the workload characteristics. It alsoinsulates against company-wide changes, the dreaded "rip-and-replace" of replacing all of your storage with somethingfrom a different vendor. With "area rug" storage, you can support a dual-vendor or multi-vendor strategy, and upgrade or replace each on its own schedule.
Thanks to open standards and industry-standard benchmarks, changing out one storage solution for another is assimple as rolling up an area rug, and putting another one in its place that is similar in size dimensions.
Others may prefer "wall-to-wall carpeting" approach: one disk system type, one tape library type,one network type, that provides unified management and minimizes the needs for unique skills. Generally, the choice of NAS, SAN or iSCSI infrastrucutre is done company-wide, and might strongly influence the set of products that will support that decision. For example, those with a mix of mainframe and distributed servers looking for SAN-attached storage may look at an [IBM System Storage DS8000] and [TS3500 tape library] that can provide support for FICON and FCP.
Those looking at NAS or iSCSI might consider the IBM System Storage N series products, "unified storage" supporting iSCSI, FCP and NAS protocols. If you want the "wall-to-wall" to stretch across all the sites in your globally integrated enterprise, IBM's scalable NAS product, Scale-Out File Services[SoFS], provides a global name spacein combination with a clustered file system that provides incredible scalability and performance based on field-proven technology used by the majority of the [Top 100 supercomputer] deployments.
IBM can help you design an information infrastructure that fits either approach.
Soon, the U.S. is switching on-air television signals from analog to digital format. The switch-over happensFebruary 17, 2009. According to the [Federal Communications Commission], Americans haveuntil this Monday, March 31, to request up to two 40-dollar coupons towards the purchase of digital-to-analog converter boxesso that the on-air digital signals can be used with existing analog-only television equipment.
(For my readers outside the United States, a bit of background explanation may be necessary. Americans consider access to television a self-evident and unalienable right.According to a Pew Research report[Luxury or Necessity?] 64 percent of Americansconsider a television set a necessity, and 33 percent consider paid providers, like cable or satellite, a necessity.Even prisoners in U.S. jails are allowed to watch television!)
Taking advantage of the "Y2K crisis" like nature of this 2/17/2009 deadline, paid providers have been advertisingthat this deadline only applies to on-air customers. Those who have cable or satellite can continue to use theiranalog equipment. I have been a subscriber for Cox Cable for some time, and my parents recently made the switchas well. Two weeks ago, however, my parents called me in a panic. Cox Cable chose to move one channel, TurnerClassic Movies (TCM), over from their analog line-up over to their digital line-up. They thought this wasn't goingto happen until 2/17/2009! They asked me to investigate and provide them alternative options.
I spoke to a Cox Cable representative.
Did Turner force Cox Cable to do this? Did they digitize their entire collection of movies? No, Cox Cable is choosing to send the TCM signal over the digital bandwidth, and they are converted back to analog by their set-top box.
Do customers who now get one less channel get a discount? No, same price, less service.
Why move a single channel over? Eventually, everything is going digital, and this is a small "baby step" to getpeople to switch over.
But TCM is a collection of grainy, black-and-white movies from the 1950s and 1960s, it is probably the channelthat gets the least benefit to convert to digital. Why choose TCM specifically? TCM is "commercial-free" so providesno additional revenue opportunity. Moving this to digital frees up an analog channel to run a new "on demand" servicethat could generate additional revenue for Cox Cable.
What would it take in terms of additional cost and equipment to watch the TCM in digital?A set-top digital box from Cox Cable, which costs one-time 10 dollars to install by a professional technician, plus 11 dollars per month for the extra "service" provided.
Do I need a High-Def television set or other equipment? No, the digital signal for TCM is standard format, so no HD equipment required.
I currently split my cable signal, so that I can watch one channel and record another, or record two separate channels at the same time, using a standard format VCR and Tivo, can I continue to do this with the digital set-top box? Yes, absolutely.
I decided to give it a try, and a technician was scheduled to perform the installation last Sunday, which was Easter holiday for some people. The technician was able to connect the set-top box directly to my television set, but thesignal is converted to a single "Channel 3", forcing the use of a separate Cox Cable remote control unit to set the channel on the set-top box. He set the set-top box to TCM (channel 199) and showed that the TCM channel was now available again.
How would my VCR or Tivo record anything? You have to set the set-top box manually to the appropriate channel desired, then set the VCR or Tivo to record "Channel 3".
How would I record one channel while watching another? That does not appear possible with this set-top box. If we split before entering the set-top box, then that equipment would get the analog channels only, not TCM.
How about recording two different channels concurrently? No way.
I feel bad for the technician. He spent two hours on his Easter Sunday to install service that I was told by theirsales rep would work with my equipment, only to find out it won't and he ended up having to take it all back out andcancel the work order. He doesn't even get paid overtime for this.
So, I am back to where I was before, analog channels minus the TCM channel. However, the lesson is clear, eventuallyeverything is going to digital, and people may not realize what this means to them.
Yesterday marked the first day of Spring here in the Northern hemisphere, and often this means it is timefor some "Spring cleaning". This is a great time to re-evaluate all of your stuff and clean house.
In the bits-vs-atoms discussion, Annie Leonard has a quick [20-minute video] about the atoms side of stuff,from extraction of natural resources, production, distribution, consumption, to final disposal.
On the bits side of things, the picture is much different.
We don't really extract information,rather we capture it, and lately that process is done directly into digital formats, from digital photography, digital recording of music, and so on. A lot of medical equipmentnow take X-rays and other medical images directly into digital format. By 2011, it is estimated that as much as 30 percent of all storage will be for holding medical images.
Production refers to the process of combining raw materials and making them into something useful. The sameapplies to information, there are a variety of ways to make information more presentable. In the Web 2.0 world, these are called Mashups, combiningraw information in a manner that are more usable.Fellow IBM blogger Bob Sutor discusses IBM's latest contribution, SMash, in his post[Secure Mashups via SMash].
According to Tim Sanders, 90 percent of business information is distributed by email, but less than 10 percentof employees are formally trained to distribute information correctly. Here's a quick 3-minute trailerto his "Dirty Dozen" rules of how to do email properly.
I have not watched the DVD that this trailer is promoting, but I certainly agree with the overall concept.
This week I also had the pleasure to hear [Art Mortell], author ofthe book The Courage to Fail: Art Mortell's Secrets to Business Success. He gave an inspirational talk about how to deal with our stressful lives. One key pointwas that stress often came from our own expectations. This is certainly true on how we consume information.Often times our expectations determine how well we read, watch or listen to information being presented.Sometimes information is factually correct, but presented in such a boring manner that it is just toodifficult to consume.
John Windsor on YouBlog takes this one step further, asking [Are you predictable?]He makes a strong case on why presenting in a predictable manner can actually hurt your chances of communication.
And finally, there is disposal. We are all a bunch of digital pack-rats. With atoms, you eventuallyrun out of closet space, with bits the problem is not as obvious, and often can be resolved by spendingyour way out of it. On average, companies are expanding their storage capacity by 57 percent every year. Thatworked well when dollar-per-GB prices of disk dropped to match, but now technology advancements are slowing down. Diskwill not be dropping in price as fast as you need, and now might be a good time to re-evaluate your"Keep everything forever" strategy.
Consider "Spring cleaning" to be an excellent excuse to evaluate the data you have on your disk systems.Should it be on disk? Will it be accessed often enough to justify that cost? Does it need immediateonline access times, or can waiting a minute or two for a tape mount from an automated library be sufficient?Does it represent business value?
I have been to customers that have discovered a lot of "orphan data" on their disk systems. This isdata that does not belong to anyone currently working at the company. Maybe the owners of the data retired,were laid off, or even fired, but nobody bothered to clean up their files after they left the company.
I've also seen a lot of "stale data" on disk, data that has not be read or written in the past 90 days.Are you spending 13-18 watts of energy to spin each disk drive just to contain data nobody ever looks at?
In some cases, orphan or stale data represents business value, and need to be kept around for businessor legal reasons. Perhaps some government regulation requires you to retain this information for someyears. In that case, rather than deleting it, move it to tape, perhaps using theIBM System Storage DR550 to protect it for the time required and handle its eventual disposal.
Certainly something to think about, while you snap the ears off those chocolate bunnies, watching yourkids run around looking for eggs. Enjoy your weekend!
Jon Toigo over at DrunkenData writes in his post[A Wink and a Nod] about thebenefits of the new IBM System z10 Enterprise Class mainframe. Here's an excerpt about storage:
"The other key point worth making about this scenario is that storage behind a z10 must conform to IBM DASD rules. That means no more BS standards wars between knuckle-draggers in the storage world who continue to mitigate the heterogeneous interoperability and manageability of distributed systems storage using proprietary lock in technologies designed as much to lock in the consumer and lock out the competition as to deliver any real value. That has got to be worth something."
For z/OS and TPF operating systems, disk must support CCW commands over ESCON or FICON connections, or NFS commandsover the Local Area Network. However, most of the workloads that are being ported over from x86 platforms willprobably be running Linux on System z images, and as such Linux supports both CCW and SCSI protocols, the latterover native FCP connections through a Storage Area Network (SAN) or via iSCSI over the Local Area Network. Many SAN directors support both FCP and FICON, and the z10 also supports both 1Gbps and 10Gbps Ethernet, so you may not have to invest in any new networking gear.
The best part is that you may not have to migrate your data. The IBM System Storage SAN Volume Controller is supported for Linux on System z, and with "image mode" you can leave the data in its original format on its original disk array. Many file systems are now supported by Linux, including Windows NTFS with the latest NTFS-3G driver.
If your data is already on NAS storage, such as the IBM System Storage N series disk systems, then the IBM z10can access it directly, from z/OS, z/VM or Linux.
Have lots of LTO tape data? Linux on System z supports LTO as well.
Jon continues his rant with a question about porting Microsoft Windows applications. Here's another excerpt:
"For one, what do we do with all the Microsoft servers. There is no Redmond-sanctioned approach to my knowledge for virtualizing Microsoft SQL Server or Exchange Server in a mainframe partition."
Yes, it is possible to run Windows on a mainframe through emulation, but I feel that's the wrong approach. Instead, the focus should be on running "functionally equivalent" programs on the native mainframe operating systems, and again Linuxis often the best choice for this. Switching from Windows to Linux may not be "Redmond-sanctioned", but it getsthe job done.
Instead of SQL Server, consider something functionally equivalent like IBM's DB2 Universal Database, or perhaps an open source database like MySQL, PostgreSQL or Apache Derby. Well-written applications use standard SQL calls, so ifthe application does not try to use unique, proprietary features of MS SQL Server, you are in good shape.
In my discussion last November on [Microsoft Exchange email server], I mentioned that Bynari makes a functionally equivalent email server on Linux that works with your existing Microsoft Outlook clients. Your end-users wouldn't know you migrated to a mainframe! (well, they might notice their email runs faster)
So if your data center has three or more racks of Sun, Dell or HP "pizza box" or "blade" x86 servers, chances are you can migrate the processing over to a shiny new IBM z10 EC mainframe, save some money in the process, without too much impact to your existing Ethernet, SAN or storage system infrastructure. IBM can even help you dispose of the oldx86 machines so that their toxic chemicals don't end up in any landfill.
I figured I need to say something about "green" on this special holiday (and yes, I am partially Irish, andthe majority of my siblings have bright red hair and freckles as it runs through my family)
Last week, I had the pleasure to meet [Dr. Jia Chen]. She has a PhDin nanotechnology and works in IBM's Watson Research Center. She is recognized as one of the top 35 scientistsunder 35 years of age by MIT, top 15 of the "Nano 50", and one of the top 80 in the National Academy of Engineering.
The two of us presented to clients at the BMW Performance Center in Greenville, SC, on the topic of the "Green" IT data center. She covered all of the advancements IBM is making on the server side, and I coveredall the things on the storage side.
The BMW Performance center is part "briefing conference location" and part "driving school". Everyone had a greattime watching the crazy stunts of the professional drivers skidding and spinning on a closed course. Some hadthe opportunity to actually drive or ride in the cars themselves.
BMW is introducing its own "energy efficiency initiative" with their [X3 Hybrid] vehicle,which will be manufactured in Greenville, SC plant.
A [recent survey] conductedby Fleishman-Hillard Researchindicates that the majority of disk-only customers are now lookingat adding tape back into their infrastructure. Here are some excerpts:
"Over two thirds of surveyed businesses said they were lookingto add tape storage back into their overall network infrastructure and of those respondents, over80-percent plan to add tape storage solutions within the next 12 months.The survey, which was taken in the fourth quarter of 2007, focused on the views of morethan 200 network administrators and mid-level tech specialists at mid-size to large companiesthroughout the United States.
The integration of tape storage into a tiered information infrastructure is highly strategic forcustomers, due to its low cost of ownership, low energy consumption and portability for dataprotection, said Cindy Grossman, Vice President of Tape Storage Systems, IBM. LTO tapetechnology is a perfect choice for enterprise and mid-sized customer with its proven reliability, highcapacity, high performance and ability to address data security with built-in encryption and dataretention requirements for the evolving data center.
According to the survey, 58-percent of the respondents use a combination of disk and tapefor long term archiving, 24-percent use tape exclusively, and 18-percent employ a disk-onlyapproach. In this group, 68-percent of the current disk-only users plan to start using tape for longtermarchiving, and over half (58-percent) plan to add tape for short-term data protection.The survey findings suggest that disk-only users may be experiencing a bit of buyer sremorse, said David Geddes, senior vice president at Fleishman-Hillard Research, who oversawthe study. We found that a wide majority of companies that employ purely disk-basedapproaches are looking to quickly include tape in their backup and archiving strategies."
While disk provides online data access and availability, tape provides additional data protectionand security, lower total cost of ownership (TCO), lower energy consumption (Tape is more "green"),and can be an important part of a long term data retention and compliance strategy.
Disk is more costly, more energy hungry, and some data, although it must be retained, may seldom, if ever be looked at, so why keep it spinning?
Speaking of TCO, in a recent 5-year TCO analysis by the Clipper Group titled[“Disk and Tape Square Off Again”]stored 2.4PB of data long term on SATA disk and on an LTO tape library, the disk system was:23:1 more costly, used 290 times the amount of energy than tapeEven with a data dedupe system like IBM System Storage N series, disk was still 5 times more costly than the tape system.
The Linear Tape Open (LTO) consortium --consisting of IBM, Hewlett-Packard (HP) and Quantum-- just released its "LTO-5" plans. With 2:1 compression,you will be able to pack up to 3TB of data on a single tape cartridge. And while dollar-per-GB declinefor disk is slowing down to 25-30 percent per year, tape continues to decline at a healthy 40 percent rate, so the price gap between diskand tape will actually widen even further over the next few years.
At IBM, our standard is to have a limit of 200MB per user mailbox. A few of us get exceptions and have up to500MB limit because of the work we do. By comparison, my personal Gmail account is now up to 6500MB. Whenthis limit is exceeded, you are unable to send out any mail until it is brought down below the limit, and a request to be "re-enabled for send" is approved, a situation we call "mail jail".
The biggest culprit are attachments. Only 10 percent of emails have attachments, but those that do take up 90percent of the total space! People attach a 15MB presentation or document, and copy the world ondistribution list. Everyone saves their notes with these attachments, and soon, the limits are blown. Not surprisingly, deduplication has been cited as a "killer app" to address email storage, exactly for this reason.If all the users have their mailboxes all stored on the same deduplication storage device, it might find theseduplicate blocks, and manage to reduce the space consumed.
A better practice would be to avoid this in the first place. Here are the techniques I use instead:
Point to the document in a database
We are heavy users of Lotus Notes databases. These can be encrypted and controlled with Access Control Lists (ACL)that determine who can create or read documents in each database. Annually, all the database ACLs are validatedso that people can confirm that they continue to have a need-to-know for the documents in each database. Sendinga confidential document as a "document link" to a database entry takes only a few bytes, and all the recipientsthat are already on the ACL have access to that document.
Point to the document on a web page
If the document is available on an internal or external website, just send the URL instead of attaching the file.Again, this takes only a few bytes. We have websites accessible only to all internal employees, websites thatcan be accessed only by a subset of employees with special permissions and credentials based on their job role, and websites that are accessible to our IBM Business Partners.
In my case, if I happen to have a blog posting that answers a question or helps illustrate an idea, I will sendthe "permalink" URL of that blog post in my email.
Point to the document on shared NAS file system
Internally, IBM uses a "Global Storage Architecture" (GSA) based on IBM's Scale-Out File Services [SoFS] with everyone getting initially 10GB of disk space to store files, with the option to request more if needed. The system has policy-based support for placing and migrating older data to tape to reduce actual disk usage, and combines a clustered file system with a global name space.
My SoFS space is now up to 25GB, and I store a lot of presentationsand whitepapers that are useful to others. A URL with "ftp://" or "http://" is all you need to point to a filein this manner, and greatly reduces the need for attachments. I can map my space as "Drive X:" on my Windows system,or as a NFS mount point on my Linux system, which allows me to easily drag files back and forth.
Departments that don't need to offer "worldwide access" use NAS boxes instead, such as the IBM System Storage N series.
Pointing to files in a shared space, rather than as attachments in email, may take some getting used to. I've hada few recipients send me requests such as "can you send that as an attachment (not a URL)" because they plan toread it on the airplane or train, where they won't have online connectivity.
"Have you invested in the latest and greatest in collaboration technology but still feel people are still not collaborating? How many Microsoft Sharepoint servers and IBM Quickplaces remain relatively untouched or only used by the organization's technorati? I think it's a big problem because this narrow view of collaboration starts to get the concept a bad name: "yeah, we did collaboration but no one used it." And then there the issue of the vast amount of money wasted and opportunities lost. We can't afford to loose faith in collaboration because the external environment is moving in a direction that mandates we collaborate. The problems we face now and into the future will only increase in complexity and it will require teams of people within and across organizations to solve them."
Well, sending pointers instead of attachments works for me, and has kept me out of "mail jail" for quite some timenow.
IDC, an independent industry analyst firm, put out their 4Q07"Worldwide Disk Storage Systems Quarterly Tracker" report. Here is an excerpts from their [press release]:
"Worldwide external disk storage systems factory revenues posted 9.8 percent year-over-year growth in the fourth quarter of 2007 (4Q07) and totaling $5.3 billion (USD), according to the IDC Worldwide Disk Storage Systems Quarterly Tracker. For the quarter, the total disk storage systems market grew to $7.5 billion (USD), up 7.6 percent from the prior year's fourth quarter. Total disk storage systems capacity shipped reach 1,645 petabytes, growing 56.3 percent."
For those wondering how an industry could grow 56.3 percent in capacity, but only 7.6 percent in revenue, it isbecause the average dollar-per-GB dropped in 2007 from $6.63 down to $4.56 (USD), representing a 31 percent decline.In the past, disk prices dropped 40 to 60 percent each year, so making single digit growth was the best major vendorscould hope for. However, lately this has slowed down to 25 to 35 percent decline, but the client demand for capacity continues at the 60 percent pace, which means that vendors could achieve double digit revenue growth soon.
Once again, IBM was ranked number 1 in total disk storage. No surprise there. Here are the details:
"Total Disk Storage Systems Market
In the total worldwide disk storage systems market, IBM lead the market with 22.9 percent followed by HP with 18.1 percent revenue share. EMC maintained the third position with 16.0 percent revenue share.
For the full year, the total disk storage systems market posted 6.6 percent growth to $26.3 billion (USD). In the total worldwide disk storage systems market, IBM and HP lead the market in statistical tie with 20.1 percent and 19.4 percent revenue share, respectively. EMC maintained the third position with 15.2 revenue revenue share."
But why focus just on disk? IDC also released their"Worldwide Combined Disk and Tape Storage 3Q07 Market Share Update", and IBM was number one for that as well,taking in 21.9 percent share. Here's a quote of IBM VP Barry Rudolph in[CNN Money]:
"IBM's continued leadership in the storage hardware market reaffirms our strategy to provide the most comprehensive tiered portfolio of storage offerings, ranging from software and services to disk and tape storage solutions," said Barry Rudolph, Vice President, Storage Stack Solutions, IBM. "IBM is the clear choice for providing information infrastructure solutions that offer the most cost-efficient, streamlined approach to help our customers increase overall productivity and maximize performance."
It is looking like 2008 is going to be a good year for IBM!
IPv4, IPv6, Wireless Mesh networking? No problem! You know linux networking inside and out
Extensive knowledge of BIND, DHCPD, Squid, Apache, security, etc.
Experience working with [Moodle] would be most excellent (it is basically a PHP web application that maintains MySQL databases for lesson plans, homework assignments and other school related information)
Adept with Python scripting or could learn it quickly. OLPC has standardized on Python for scripting (although knowledge in Perl and PHP won't hurt either)
You look to implement a practical solution that less skilled sysadmins can easily maintain over a cooler but more complicated solution.
You play well with others. You don’t alienate collaborators with rude e-mails that assert your technical superiority (even though you are)
Your primary concern is meeting the educational needs of kids and teachers. Your rate technical awesomeness a distant second to meeting those critical needs.
I've been working with Dev, Bryan and Sulochan for the past three months (remotely here from Tucson, AZ)but we've come to a point where we need on-site expertise. I will continue to provide remote support.
Given the number of readers who have contacted me over the past year looking for an IT job (or a different job because they are not happy where they are), this could be an amazing experience.
It's been a while since I've talked about [Second Life].
The latest post on eightbar[Spimes, Motes and Data centers]discusses IBM's use of virtual world technology to analyze data centers in three dimensions.New World Note asks[What's The Point Of 3D Data Centers?]One would think that a simple monitoring tool based on a two-dimensional floor plan would be enough to evaluate a data center.
Enter Michael Osias, IBM (a.k.a Illuminous Beltran in Second Life). Some of the leading news sites havebegun to notice some 3D data centers that he has helped pioneer. UgoTrade writes up an article aboutMichael and the media attention in [The Wizard of IBM's 3DData Centers].
Of course, in presenting these "Real Life/Second Life" (RL/SL) interactive technologies, IBM is sometimes the target of ridicule. Why? Because IBM is 10 years ahead of everyone else. So, are there aspects of a data center where 3D interfaces makes sense? I think there is.
IBM TotalStorage Productivity Center has an awesome "topology viewer" that shows what servers are connectedto which switches, to which disk systems and tape libraries. This is all done in a 2D diagram, generated dynamicallywith data discovered through open standard interfaces, similar to what you might draw manually with toolslike Visio. Imagine, however, howmore powerful if it were a 3D viewer, with virtual equipment mapped to the physical location of each pieceof hardware on the data center floor, including the position on the rack and location on the data center floor.
Designing computer room air conditioning (CRAC) systems is actually a three dimensional problem. Cold air isfed underneath the raised floor, comes up through strategically placed "vent" tiles, taken in the front ofeach rack. Hot air comes out the back of each rack, and hopefully finds ceiling duct intake to get cooled again.The temperature six inches off the floor is different than the temperature six feet off the floor, and 3Dmonitor tools could be helpful in identifying "hot spots" that need attention. In this case "spimes" representsensors in the 3D virtual world, able to report back information to help diagnose problems or monitor events.
After many people left the mainframe in favor of running a single application per distributed server, the pendulumhas finally swung back. Companies are discovering the many benefits of changing this behavior. "Re-centralization" is the task at hand. Thanks to virtualization of servers, networks and storage, sharing common resources canonce again claim the benefits of economies of scale. In many cases, servers work together in collective unitsfor specific applications that might benefit better if consolidated together onto the same equipment.
IBM's "New Enterprise Data Center" vision recognizes that people will need to focus on the management aspectsof their IT infrastructure, and 3D virtual world technologies might be an effective way to getthe job done.
I am always amused in the manner the IT industry tries to solve problems. Take, for example, theprocess of backups. The simplest approach is to backup everything, and keep "n" versions of that.Simple enough for a small customer who has only a handful of machines, but does not scale well. Inmy post [Times a Million],I coined the phrase "laptop mentality", referring to people's inability to think through solutions in large scale.
Apparently, I am not alone.Steve Duplessie (ESG) wrote in his post[Random Thoughts]:
"I may even get to stop yelling at people to stop doing full backups every week on non-changing data (which is 80 %+) just because that's how they used to do it. They won't have a choice. You can't back up 5X your current data the way you do (or don't) today."
Hu Yoshida (HDS) does a great job explaining that thereare three ways to perform deduplication for backups:
Pre-processing. Have the backup software not backup unchanged data.
Inline processing. Have an index to filter the output of the backup as it sends data to storage.
Post-processing. Have the receiving storage detect duplicates and handle them accordingly.
"A full backup of 1TB data base tablespace is taken on day one. The next day another full backup is taken and only 2GB of that backup has any changes.
Using traditional full backup approaches after 2 nights, the backup capacity required is 2 x 1TB = 2TB
One method of calculating de-duplication ratios could yield a low ratio:
Total de-duplicated backup capacity used = 1TB + 2GB = 1.002TB
If the de-duplication ratio compares the amount of total physical storage used to the total amount that would have been used by traditional backup methods, the ratio = 2TB / 1.002TB = approximately 2:1
Another method of calculating de-duplication ratios could yield a high ratio:
Total de-duplicated backup capacity used still = 1.002TB
If the de-duplication ratio compares the amount of data stored in the most recent (second) backup to the amount that would have been used by traditional backup methods, the ratio 1TB / 2GB = 1000GB / 2GB = 500:1"
While IBM also offers deduplication in the IBM System Storage N series disk systems, I find that for backup, itis often more effective to apply best practices via IBM Tivoli Storage Manager (TSM). Let's take a look at some:
Exclude Operating System files
Why take full backups of your operating system every day? Yes, deduplication will find a lot to reduce fromthis, but best practices would exclude these. TSM has an include/exclude list, and the default version excludesall the operating system files that would be recovered from "bare machine recovery" or "new system install"procedures. Often, if the replacement machine has different gear inside, your OS backups aren't what you need,and a fresh OS install may determine this and install different drivers or different settings.
Exclude Application programs
Again, yes if there are several machines running the same application, you probably have opportunity for deduplication. However, unless you match these up with the appropriate registry or settings buried down in theoperating system, recovering just application program files may render an unusable system. Applications are bestinstalled from a common source that are either "pushed" through software distribution, or "pulled" from an application installation space.
If you have TB-sized databases, and are only doing full backups daily to protect it, have I got a solution for you.IBM and others have software that are "application-aware" and "database-aware" enough to determine what haschanged since the last backup and copy only that delta. Taking advantage of the TSM Application ProgrammingInterface (API) allows for both IBM and third party tools to take these delta backups correctly.
Which leaves us with user files, which are often unique enough on their own from the files of other users,that would not benefit from file-level deduplication. Backing up changed data only, as TSM does with its patented ["progressive incremental backup"] method, generally gets most of the benefits described by deduplication, without having to purchase storage hardware features.
Of course, if two or more users have identical files, the question might be why these are not stored on acommon file share. NAS file share repositories can greatly reduce each user keeping their own set of duplicates.It is interesting that some block-oriented deduplication,such as that found in the IBM System Storage N series, can get some benefit because some user files are oftenderivatives of other files, and there might be some 4 KB blocks of data in common.
Last November, I visited a customer in Canada. All of their problems were a direct result of taking full backupsevery weekend. It put a strain on their network; it used up too many disk and tape resources; and it took too long tocomplete. They asked about virtual tape libraries, deduplication, and anything else that could help them. The answer was simple: switch to IBM Tivoli Storage Manager and apply best practices.
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
Basic HyperSwap: single-site high availability for the disk systems only
GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
Yesterday, I asked if you were prepared for the future? The future is now. Today, IBM announced its["New Enterprise Data Center"] vision and strategy which spans software, hardware and services in dealing withthe latest challenges that our clients are faced with today, or will face sooner or later this century.
Here's an excerpt:
Align IT with business goals These changes demand that IT improve cost and service delivery, manage escalating complexity, and better secure the enterprise. And aligning IT more closely with the business becomes a primary goal. The new enterprise data center is an evolutionary new model for efficient IT delivery that helps provide the freedom to drive business innovation. Through a service oriented model, IT will be able to better manage costs, improve operational performance and resiliency, and more quickly respond to business needs. This approach will deliver dynamic and seamless access to IT services and resources, improving both productivity and satisfaction.
IBM's Vision for the New Enterprise Data Center The new enterprise data center can improve the integration of people, process, and technology in your business to help you improve efficiency and effectiveness. As you implement a new enterprise data center strategy, your infrastructure becomes open, efficient, and easy to manage. And your IT staff can move from a focus on fixing IT problems to solving business challenges. Ultimately your processes become standardized and efficient, focused on business needs rather than technology.
A lot was announced today, so I will give a quick recap now, and cover specific areas over the rest of the week.
IBM System z10 Enteprise Class
IBM introduces its most powerful mainframe. Before you think "Wait, that's a mainframe, that doesn't apply to me"stop to consider all that IBM has done to make the mainframe an "open system" without sacrificing security oravailability:
Open standard connectivity, including TCP/IP and now 6Gbps Infiniband and 10GbE Ethernet.
Unix System Services. Yes, z/OS is certified to provide UNIX interfaces for today's applications.
HFS and zFS file systems that can be mounted, shared, and used by traditional z/OS applications and JCL.
Linux and Java. Many of today's largest websites are run on mainframes behind the scenes.
Extreme bandwidth. The z10 EC handles up to 336 FICON channels (4Gbps) for large data processing workloads
The z10 EC is as powerful as 1,500 x86 (such as Intel or AMD) servers, but consumes 85 percent less floorspace and85 percent less energy. (They should put a "green" stripe down the front of this box just to remind everyone how energy efficient this server really is!) For more on the z10 EC, see the[Press Release].
Enhanced IBM System Storage DS8000
With the XIV acquisition taking the role as the best place to put unstructured files for Web 2.0 applications,the IBM DS8000 can focus on its core strength, managing databases and online transactions for the mainframe.There's enough here to justify its own post, so I will cover this later.
IT Service Management Center for z (ITSMCz)
Trust me, I don't make up these acronyms. IT Service Management are the policies and procedures for managingan IT environment, such as following the best practices documented in the IT Infrastructure Library (ITIL).In the past, IBM tools have focused on Linux, UNIX and Windows on distributed servers, but today ITSMCz bringsall of that to the mainframe! (or perhaps more correct to say "brings the mainframe to all that"!)
IT Transformation & Optimization - Infrastructure Strategy and Planning services
I don't make up the names of our service offerings either. However, one thing is clear, it is time for peopleto re-evaluate their current data center, and come up with a new plan. The average data center is 15 years old.According to Gartner Group, more than 70 percent of the world's "Global 1000" organizations will have to make significant modifications to their data centers in the next five years. IBM can help, and is rolling outa new set of services specifically to help clients make this transition, to better align their IT to their business strategies.
Economic Stimulus Package
IBM borrowed this idea from the U.S. government. IBM Global Financing is offering special terms and ratesfor new equipment installed by December 31 this year.
Want to learn more? Read this 15-page[IBM's Vision]white paper.
IBM Developerworks that host this blog suggest posting once per day. General blogging guidelines I have found suggest 300 to 500 words per post. Most magazine and newspaper articles range around 700 words.In my book, [Inside System Storage: Volume I], I had 165 posts covering twelve months, with an average of 636 words per post.
longer posts, perhaps once a week or less
I've seen several executives adopt this approach. When they have something to say, out comes a long speech,in written form, when the occasion deems it necessary. Some of the more technical blogs adopt this approachalso, going into great detail on product specifications and supporting material to make their case.
Either way, it comes out to perhaps 2000 words per week, that can be 20 posts of 100 words each, four posts that are 500 words each, or one long post for the week. Currently, I post about 2-5 times per week, with posts 500-700 words long. I can try to mix short posts with long ones, to give you readers some variety. Post a comment below on whether you prefer that I do more/shorter or fewer/longer.
As for the future of IT...
In a recent post by fellow blogger (and author) Nick Carr titled [Alan Turing, cloud computing and IT's future], he mentions he has a free download of a 7-page PDF called "IT in 2018: from Turing's machine to the computing cloud." It's a quick read, covering many of thepoints in his most recent book, The Big Switch. Here's an excerpt:
As for computer professionals, the coming of the WorldWide Computer means a realignment of the IT workforce,with some jobs disappearing, some shifting fromusers to suppliers, and others becoming more prominent.On the supplier side, we’ll likely see booming demand for the skills required to design and run reliable,large-scale computing plants. Expertise in parallelprocessing, virtualization, artificial intelligence, energymanagement and cooling, encryption, high-speed networking,and related fields will be coveted and rewarded.Much software will also need to be written orrewritten to run efficiently on the new infrastructure. Ina clear sign of the new labor requirements, Google andIBM have teamed up to spearhead a major educationinitiative aimed at training university students to writeprograms for massively parallel systems.
Some interesting insights from Google can be read in New York Times'Freakonomics blog, where Steve Dubner interviews Google's chief economist: [Hal Varian Answers Your Questions]Hal comes up with some clever answers to some rather tough questions. It's worth a read.
It is good to have futurists like this. However, as we caution in IBM, those who seek a life througha crystal ball... must often settle for a diet of broken glass.I will close with one of my favorite quotes.
"As I've said many times, the future is already here. It's just not very evenly distributed." --- William Gibson (science-fiction author)
So, yes, I may sometimes look at the rear-view mirror. However, there is a common theme from Nick Carr to Steve Dubnerto William Gibson. They also look back to the past to give insights on how things might unfold in the future.
My view is that for some the future is already here. IBM already offers the product, service or solutionthat might be just what you need, but you just haven't gotten it yet. Future for you, but past for us.For others, the future is repeating a pattern we have already seen in the past. Understanding what happened back then helps us be better prepared to understand what is happening now, in the directions and trends we forecast moving forward.
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.
Last year, in my post [Inaugural Brand Impact 2007 Awards], I mentioned how IBM beat out other major storage vendors for the best brand "IBM System Storage". I am proud of this, and highlighted it as one of my team's key accomplishments during my brief20-month career in marketing, which I recapped in my post[Switching Over from What and Why] when I switched over to consulting.
This year, IBM did it again. For a second consecutive year, IBM System Storage was recognized by [Liquid Agency]as the leading brand for enterprise storage. Here is an excerpt from the [IBM Press Release]:
"IBM System Storage is the most trusted storage portfolio in the world, providing our clients leading disk, tape and storage software solutions and services. This award reflects IBM's priority in delivering information infrastructure solutions to solve our client's most critical storage challenges," said Barry Rudolph, Vice President, IBM System Storage. "We are helping clients -- from large corporations to small businesses -- intelligently manage information as a strategic business asset. We are proud to be recognized as the clear market leader in delivering solutions that help our clients manage and extract value from their information."
Liquid Agency reviewed over 250 technology brands to make this assessment.
The Business/IT Alignment category is critical for many companies; getting these two key divisions in sync provides a huge competitive advantage. This year’s winner – by a landslide – is IBM's [Innov8].
This Big Blue product has a touch of the sci-fi to it: it’s an interactive, 3-D business simulator intended to close the divide between IT staff and business executives. In other words, it’s…a video game. I guarantee you that in all the decades that Datamation has done its Product of the Year awards, never has a video game won. The times they are a-changin’.
Whether a server is the “best” server is, in truth, based on your company’s individual needs and budgets. In the server world, with its myriad options and add-ons, one size definitely does not fit all. That said, IBM p 570 Server must fit plenty of needs; the box easily won the Enterprise Server category. IBM claims this workhorse doubles the speed of its predecessor without requiring a larger energy footprint.
IBM Lotus Symphony
When it comes to total numbers of users, there’s no question that Microsoft Office is the 800-pound gorilla of this category. The deeply entrenched Office makes the corporate world go ‘round. Given Office’s status, it’s a major eyebrow raiser that this category was won by relative newcomer IBM Lotus Symphony. Perhaps it’s because Big Blue’s product is free (that always helps), or because IBM is itself such an established vendor. Whatever the case, consider this vote as a huge upset.
(Note: IBM Lotus Symphony is available for [free download] for Windows and Linux.When my friend purchased a new laptop that came pre-installed with Windows Vista, he was surprised to see that Microsoft Office was not included. I pointed him to Lotus Symphony, and he is running great with his existing Word, Powerpoint and Excel documents! I use Lotus Symphony on both Windows and Linux, and IBM plans to make a version available for Mac OS X-- when that happens, I have my Mac Mini G4 waiting to try it out.)
IBM Wireless Software for Business Intelligence (BI) on the go
For most of 2007, IBM Cognos 8 Go! Mobile software supported only Blackberry units. At the end of last year, Cognos upgraded its wireless business intelligence software – which delivers business reports to on-the-go staffers – to support handhelds that run Windows Mobile OS. Naturally, this expanded the company’s user base, and likely helped Cognos 8 Go! Mobile win the Wireless Software category.
(If you have a RIM Blackberry handheld device, you can try out this[actual demo].)
Wow! That's a lot of awards. Congratulations to all my IBM colleagues who made this happen!
However, I have to assume his real question is ... "what is the quick and easy way for me to build a lightweight database app like Microsoft Access that I can distribute as a standalone executable?"
To which I would say "Lotus has a program called Approach, which is part of Lotus SmartSuite, which some people still use. However, a lot of the focus in IBM now centers around the lightweight Cloudscape database which IBM acquired from Informix, which is now known as the [open source project called Derby]. Many IBM and Lotus products, such as Lotus Expeditor use the JDBC connection to Derby, which allows you to use Windows, Linux, Flash, etc. ... with no vendor lock in".
I am familiar with Cloudscape, and I evaluated it as a potential database for IBM TotalStorage Productivity Center, when I was the lead architect defining the version 1 release. It runs entirely on Java, which is both a plus and minus. Plus in that it runs anywhere Java runs, but a minus in that it is not optimized for high performance or large scalability. Because of this, we decided instead on using the full commercial DB2 database instead for Productivity Center.
Not to be undone, my colleagues over at DB2 offered a different alternative, [DB2 Express-C], which runs on a variety of Windows, Linux-x86, and Linux on POWER platforms. It is "free" as in beer, not free as in speech, which means you can download and use it today at no charge, and even ship products with it included, but you are not allowed to modify and distribute altered versions of it, as you can with "free as in speech" open source code, as in the case of Derby above (see [Apache License 2.0"] for details).
As I see it, DB2 Express-C has two key advantages. First, if you like the free version, you can purchase a "support contract" for those that need extra hand-holding, or are using this as part of a commercial business venture. Second,for those who do prefer vendor lock-in, it is easyto upgrade Express-C to the full IBM DB2 database product, so if you are developing a product intended for use with DB2, you can develop it first with DB2 Express-C, and migrate up to full DB2 commercial version when you are ready.
This is perhaps more information than you probably expected for such a simple question. Meanwhile, I am stilltrying to figure out MySQL as part of my [OLPC volunteer project].
Wrapping up my week on the Feb 12 announcements, I will finish off talking about thenew Half-High (HH) LTO4 drives available for our TS3100 and TS3200 tape libraries.
Small and medium sized business (SMB) clients are looking for small, affordable tapesystems. Tape is inherently green, using orders of magnitude less energy than disk,and is very scalable by simply purchasing more tape cartridges.
When IBM first announced them, the TS3100 supported one drive with 24 cartridges,and the TS3200 (see picture at left) supported two drives and 48 cartridges. Unlike disk, that mentions RAWcapacity and then lowers it to indicate usable capacity in RAID configurations, tapeis just the opposite. LTO4 cartridges have 800 GB raw capacity, but with an average of 2:1compression, can hold a usable 1.6 TB of data. LTO4 also supports WORM cartridges fornon-erasable, non-rewriteable (NENR) types of data, and encryption capability.
As a follow-on to our HH LTO3 drives, IBM is the first major storage vendor to offerthe new HH LTO4 drives in entry-level automation, which directly attach via 3Gbps SAS connections to your host servers. The HH models allows you to have two drives in the TS3100, and four drives in the TS3200.
You can mix and match, LTO3 and LTO4. Why would anyone do that? Well, the Linear Tape Open [LTO]consortium --made up of technology provider companies IBM, HP and Quantum--decided to support N-2 generation read, and N-1 generation read/write. So, anLTO3 can read LTO1 cartridges, and read/write LTO2 and LTO3 cartridges. TheLTO4 can read LTO2 cartridges, and read/write LTO3 and LTO4 cartridges. For SMBcustomers that still have some LTO1 cartridges they might want to read some day,mixing LTO3 and LTO4 is a viable combination.
Of course, IBM still offers full-high (FH) versions of LTO3 and LTO4, which offer a bit faster acceleration, back-hitch and rewind times than their HH counterparts, and also offer additional attachment choices of LVD Ultra160 SCSIand 4 Gbps Fibre Channel as well.
So, for SMB customers that are simply using their tape for backup and archive,and probably not driving maximum rated speeds, having twice as many slowerdrives might be just the right fit.
Today, I'll cover the announcements related to our IBM System Storage N series disk systems, which ties inwith Valentines Day theme nicely. The phrase we use for "unified storage" is that N series allows you to "share the closet, not necessarily the clothes". Couples recognize the value of a shared closet over having one closet for just the man's clothes, and a separate closet for just the woman's clothes. (For some couples, the man's closet would be terribly under utilized!). By analogy, the N series allows you to share one solution for LUNs that can be accessed via FCP or iSCSI protocols, and NAS file systems that can be accessed via NFS and CIFS protocols. In most data centers, Windows and UNIX applications are about as likely to share files as men and women are to wear each other's clothes, so the analogy is in tact.
Let's take a look at what got announced:
N7700 and N7900
There are actually [eight new high-end N series] models. the N7900 has 4 processors and 32GB of cache. The N7700 has 2 processors and 16GB cache. Each has two appliance models (A11 single node and A21 dual node) and two gateway models (G11 single node and G21 dual node).
The appliance models support both FC and SATA disk. The N7900 A models support a maximum of 1176 drives; the N7700 A models supports 840 drives. The gateway models provide FCP, iSCSI and NAS host access through external disk attachment. The N7900 gateway models support 1176 LUNs on external disk systems; the N7700 gateway models support 840 external LUNs.
N series now supports 1 TB SATA disk
The [EXN1000 expansion drawer] can now have up to fourteen 1TB SATA drives. This is in addition to previousannouncements supporting 500GB and 750GB drive capacities. These drawer support the entire N series line.
With 1 TB drives, the N7900 now supports up to 1176 TB of raw capacity, which is over 1PB of usabledata in 12+2P RAID-DP mode. This is greater than the internal disk capacity limits of current IBM DS8000, EMC DMX andHDS USP-V models.
At the low end, both the N3300 and N3600 now support 500GB, 750GB and 1TB SATA drives in addition to the SASdrives they supported.
SnapManager for Microsoft SharePoint
There is a new SnapManager in town. This one is for Microsoft SharePoint data. See the announcementfor the [N3300 and N3600] for details.
On Jan 24, IBM signed agreements with [Ingram Micro, Tech Data, and Synnex], to distribute the N Series products and work with IBM to recruit new solution providers to the line. These three are all well-respected world-class distribution providers, so weare glad to have increased our partnership with them on this.
Yesterday, I promised I would cover other products from the Feb 12 announcement. Today I will focus on the IBM SAN768B director. Some people are confused on the differences between switchesand directors. I find there are three key differences:
Directors are designed to be 24x7 operation, highly available with no single points of failure or repair. Generally, all components in directors are redundant and hot-swappable, including Control Processors. In switches, some components are redundant and hot-swappable, such as fans and power supplies), but not the “motherboard” or controller. Often you have to take down a switch to make firmware or major hardware changes or upgrades.
Directors are designed to take in "blades" with different features, port counts, or protocol capabilities. You can add or remove blades while the system is up and running. Switches have a fixed number of ports. (A Small Form-factor Pluggable optical transceiver [SFP] is the component that turns electric pulses into light pulses (and visa versa). You plug the SFP into the switch, and then the fiber optic cable is plugged into the SFP).
With switches, you often start with a base number of active ports, and then can enable the rest of the ports as you need them.
Directors have hundreds of ports. Switches tend to have 64 ports or less.
Last year, Brocade acquired McDATA. Both were OEMs for IBM, and IBM distinguished that in the naming convention. The IBM SAN***B name was used to denote products manufactured for IBM by Brocade, and a SAN***M name was used to denote products manufactured by McDATA.
At that time, Brocade and McDATA equipment did not mix very well on the same fabric, so IBM retained the naming convention so that you as a customer knew what it worked with.
Brocade now has released with new levels of both operating systems--Brocade's FOS and McDATA's EOS--and their respective fabric managers--Brocade Fabric Manager (FM) and McDATA's Enterprise Fabric Connectivity Manager (EFCM)--so that they have full interoperability.
Brocade's goal is to enhance EFCM to be a common software management platform for all of their products going forward.
IBM used the maximum port count in the name to provide some clue as to the size of the switch or director. The SAN16B-2 or the SAN32B-3 are switches that have a maximum of 16 and 32 ports. The SAN256B supports a maximumeight blades of your choosing.Two different types were supported for FC ports, a 16-port blade and a 32-port blade.If all eight were 32-port blades then the maximum was 256 ports, hence the name. But then Brocade began offering 48-port blades. Should IBM change the name? No, it decided to leave itthe SAN256B even though it can now have a maximum of 384 ports.
Not to confuse anyone, the SAN768B also has a maximum of 384 ports, in the same 14U dimensions, but with a special twist. Normally to connect two directors together you use up ports from each, in what are called "inter-switch links" (ISL).These are ports you are taking away from availability from the servers and storage controllers. The SAN768Boffers a new alternative called "inter-chassis links". Each SAN768B has two processing blades, and each has two ICL ports, so with just four two-meter (2m) cables, you get the equivalent of 128 FC 8 Gbps ISL links without using 128 individual ports on each side. That is like giving you 256 ports back for use with servers and storage!
Since IBM directors require 240 volt power, IBM TotalStorage SAN Cabinet C36 include power distribution units (PDUs). PDUs are just glorified power strips, but a new intelligent PDU (iPDU) option introduces additional intelligence to monitor energy consumption for customers looking to measure, and perhaps charge back, energy consumption to the rest of the business. You can stack two SAN768B in one cabinet, one on top of the other, and connected via ICLs, it wouldlook like one huge 768-port backbone.
As a backbone for your data center, the SAN768B is positioned for two emerging technologies:
8 Gbps Fibre Channel (FC)
The SAN768B is powerful enough to have 32-port blades run full speed on all ports off-blade without oversubscription. Oversubscription is an emotional topic.
Normally, blades (like switches) can handle all traffic at full speed without delays provided the in-bound and out-bound ports involved are all on the same blade. In a director, however, if you need to communicate from a port on one blade to a port on a different blade, it is possible that off-blade traffic might be constrained or delayed in its transit across the backplane.
On the SAN768B, both the 16-port and 32-port blades can run at full 8 Gbps speed, and the 48-port is exposed to oversubscription only if you have more than 32-ports running at full 8 Gbps transferring data off-blade concurrently.
The new 8 Gbps SFPs support auto-negotiation at N-1 and N-2 generation link speeds. This means that they will automatically slow down when communicating with 4Gpbs and 2 Gbps devices, but they cannot communicate with 1 Gbps devices. If you are still using 1 Gbps devices in your data center, you will need to use 4 Gbps SFPs (which also support 2 Gbps and 1 Gbps link speeds) to communicate with those older devices.
Basically, this new technology enables transport of Fibre Channel packets over 10 Gbps Ethernet links. This 10 Gbps Ethernet can also be used to carry traditional iSCSI and TCP/IP traffic. FCoE introduces new extensions to provide Fibre Channel characteristics, like being lossless, and offering consistent performance. The ANSI T11 team is driving FCoE as an open standard, and at the moment it is not fully baked. I suggest you don't buy any FCoE equipment prematurely, as pre-standard devices or host bus adapters could get you burned later when the standard is finalized.
The idea is that FCoE blades can be installed in a SAN768B along with traditional FC blades, allowing routing of traffic between traditional FC and new FCoE ports. Those who have invested in FCIP for long distance replication will be able to continue using either FC or FCoE inputs.
One of the big drivers of FCoE is IBM BladeCenter. Currently, most BladeCenter blades support both Ethernet and FC connectivity and are connected to both Ethernet and FC switches on the back of each BladeCenter chassis. With FCoE, we have the potential to run both FC and IP traffic across simpler all-Ethernet blades, connecting through all-Ethernet switches on the backs of each chassis.
For more information on the IBM SAN768B, see the [IBM Press Release]. For more detailson Brocade's strategy, here is an 8-page white paper on their[Data Center Fabric] vision.
It's Tuesday, and you know what that means-- IBM makes its announcements.
Today, IBM announced a variety of storage offerings, but I am going to just focus this poston just the new DR550 models. The DR550 is the leading disk-and-tape solution forstoring non-erasable, non-rewriteable (NENR) data. This type of data, often called fixed-contentor compliance data, was previously writtento Write-Once-Read-Only (WORM) optical media. However, Optical technology has not advanced as fastas magnetic recording, so disk and tape have taken over this role. While there are still a fewlaws on the books that mandate "optical media" as the storage solution, new laws like SEC 17a-4and Sarbanes-Oxley (SOX) allow for NENR solutions based on magnetic disk or tape instead.
As we had done for the IBM SAN Volume Controller (SVC), the DR550 was based on "off the shelf"components. The File System Gateway (FSG) was based on System x server, the DR550 hardwarebased on System p server and DS4000 disk arrays, with "hardened" versions of the AIX,DS4000 Storage Manager and IBM Tivoli Storage Manager (TSM) that we renamed the IBM SystemStorage Archive Manager (SSAM).
The DR550 is Ethernet-based, so it can be used with all IBM server platforms, from System xand BladeCenter, to System i, and System p, and even System z mainframe customers, as wellas non-IBM platforms from Sun, HP and others. There are two ways to get data stored ontothe DR550:
Sending archive objects via the SSAM archive API. This is an API based on the XBSA open standardthat many applications have coded to.
Writing files via standard CIFS and NFS protocols through the File System Gateway (FSG), an optional priced feature that you can have incorporated into the DR550.
Generally, business applications like SAP or Microsoft Exchange don't do this directly, but ratheryou have an "archive management application" that acts as the go-between broker. IBM offers IBM Content Manager, IBM CommonStore for eMail (Exchange and Lotus Domino), and IBM CommonStore for SAP.IBM also recently acquired FileNet and Princeton Softech that provide additional support. Third partyproducts like Zantaz and Symantec KVS Enterprise Vault have also passed System Storage Provencertification for the DR550. These go-between applications understand the underlying storagestructure of their respective applications, and can apply policies to extract database rows, individualemails, or other attachments, as appropriate, and either move or copy them into the DR550.
The DR550 has built in support to move data from disk to tape, through policy-based automation behind the scenes. This is the key differentiator fromdisk-only solutions. Rather than filling up an EMC Centera, and watching it sit there idle burning energyfor five to seven years, or however long you are required to keep the data, you can instead use the disk for the most recent months worth of data on a DR550. The DR550 attaches to tapedrives or libraries, not just IBM TS1120 or LTO based models, but hundreds of systems from other vendorsas well. You can combine this with either rewriteable or WORM tape cartridge media, depending on yourcircumstances. This can be directly cabled, or through a SAN fabric environment. Storing the bulk ofthis rarely-referenced data on tape makes the DR550 substantially more affordable and more green thandisk-only alternatives.
Let's take a look at the specific models:
IBM System Storage DR550 DR1
The DR1 machine-type-model replaces the "DR550 Express" for small and medium size business workloads. This is a singleSystem p server with anywhere from 1 to 36 TB of raw disk capacity in a nice lockable 25U cabinet (see picture at left). On the original DR550 Express, the 25U cabinet was optional, but so many people opted for it, that wemade it standard feature. You can add the File System Gateway, which is a System x running Linuxwith NFS and CIFS protocols converted to SSAM API calls.
IBM System Storage DR550 DR2
The DR2 machine-type-model replaces the larger "DR550" for enterprise workloads. This can be either a single or dual node System p configuration, anywhere from 6 to 168 TB in raw disk capacity, in a lockable 36U cabinet. This also allows for an optional File System Gateway, and in the case of thedual node configuration, you can have two System p servers, and two System x servers with two Ethernetand two SAN switches for complete redundancy.
Common Information Model (CIM) and SMI-S interfaces have been added so that IBM Director can providea "single pane of glass" to manage all of the components of the DR550.
The system is based on high-capacity 750GB SATA drives, installed in half-drawer (eight drives, 6 TB)and full-drawer (16 drives, 12 TB) increments. Your choices will be 7+P RAID5 or 6+P+Q RAID6.Here is an Intel article that explains [RAID6 P+Q].In the future, as new disk technologies are introduced, the DR550 supports moving the disk datafrom old to new seamlessly, without disrupting the data retention policies enforcement.
For more information, here is a [6-page brochure] thathas specifications for both the DR1 and DR2 models.
Rich Bourdeau has written a nice article on InfoStor titled [Software as a Service (SaaS) meets Storage]. Last year, IBM acquired Arsenal Digital, and he mentions both in this article.It is interesting how this has evolved over the years.
Rent warehouse space for tapes
I remember when various companies offered remote storage for tapes. These would be temperature and humidity-controlledrooms, with access lists on who could bring tapes in, who could take tapes out, and so on. In the event of thedisaster, someone would collect the appropriate tapes and take them to a recovery site location.
Rent online/nearline storage from a Storage Service Provider (SSP)
SSPs rented storage space on disk, or provided automated tape libraries that could be written to. With tapes being ejected and stored in temperature/humidity-controlled vaults. Electronic vaulting eliminates a lot of theissues with cartridge handling and transportation, is more secure, and faster. Rented disk space, based on a Gigabytes-per-month rate, could be used for whatever the customer wanted. If these were for backups or archive,then the customer has to have their own software, to do their own processing at their own location, sending the data to the remote storage as appropriate, and manage their own administration.
Backup-as-a-Service and Archive-as-a-Service
We are now seeing the SaaS model applied to mundane and routine storage management tasks. New providers can offerthe software to send backups, the disk to write them to, and as needed the tape libraries and cartridges to rollover when the disk space is full. Disk capacity can be sized so that the most recent backups are on immediately accessible for fast recovery.
The same concept can be applied to archives. The key difference between a backup and an archive is that backups areversion-based. You might keep three versions of a backup, the most recent, and two older copies, in case something is wrong with the most recent copy, you can go back to older copies. This could be from undetected corruption of the data itself, or problems with the disk or tape media. An archive, on the other hand, is time-based. You want this data to be kept for a specific period of time, based on an event or fixed period of years.
Since BaaS and AaaS providers know what the data is, have some idea of the policies and usage patterns will be, can then optimize a storage solution that best meets service level agreements.
Many people have asked me if there was any logic with the IBM naming convention of IBM Systems branded servers. Here's your quick and easy cheat sheet:
System x -- "x" for cross-platform architecture. Technologies from our mainframe and UNIX servers were brought into chips that sit next to the Intel or AMD processors to provide a more reliable x86 server experience. For example, some models have a POWER processor-based Remote Supervisor Adapter (RSA).
System p -- "p" for POWER architecture.
System z -- "z" for Zero-downtime, zero-exposures. Our lawyers prefer "near-zero", but this is about as close as you get to ["six-nines" availability] in our industry, with the highest level of security and encryption, no other vendor comes close, so you get the idea.
But what about the "i" for System i? Officially, it stands for "Integrated" in that it could integrate different applications running on different operating systems onto a [COMMON] platform. Options were available to insert Intel-based processor cards that ran Windows, or attach special cables that allowed separate System x servers running Windows to attach to a System i. Both allowed Windows applications to share the internal LAN and SAN inside the System i machine. Later, IBM allowed [AIX on System i] and [Linux on Power] operating systems to run as well.
From a storage perspective, we often joked that the "i" stood for "island", as most System i machines used internal disk, or attached externally to only a fewselected models of disk from IBM and EMC that had special support for i5/OS using a special, non-standard 520-byte disk block size. This meant only our popular IBM System Storage DS6000 and DS8000 series disk systems were available. This block size requirement only applies to disk. For tape, i5/OS supports both IBM TS1120 and LTO tape systems. For the most part,System i machines stood separate from the mainframe, and the rest of the Linux, UNIX and Windows distributed serverson the data center floor.
Often, when I am talking to customers, they ask when will product xyz be supported on System z or System i?I explained that IBM's strategy is not to make all storage devices connect via ESCON/FICON or support non-standard block sizes, but rather to get the servers to use standard 512-byte block size, Fibre Channel and other standard protocols.(The old adage applies: If you can't get Mohamed to move to the mountain, get the mountain to move to Mohamed).
On the System z mainframe, we are 60 percent there, allowing three of the five operating systems (z/VM, z/VSE and Linux) to access FCP-based disk and tape devices. (Four out of six if you include [OpenSolaris for the mainframe])But what about System i? As the characters on the popular television show [LOST] would say: It's time to get off the island!
Last week, IBM announced the new [i5/OS V6R1 operating system] with features that will greatly improve the use of external storage on this platform. Check this out:
POWER6-based System i 570 model server
Our latest, most powerful POWER processor brought to the System i platform. The 570 model will be the first in the System i family of servers to make use of new processing technology, using up to 16 (sixteen!) POWER6 processors (running at 4.7GHZ) in each machine.The advantage of the new processors is the increased commercial processing workload (CPW) rating, 31 percent greater than the POWER5+ version and 72 percent greater than the POWER5 version. CPW is the "MIPS" or "TeraFlops" rating for comparing System i servers.Here is the[Announcement Letter].
Fibre Channel Adapter for System i hardware
That's right, these are [Smart IOAs], so an I/O Processor (IOP) is no longer required! You can even boot the Initial Program Load (IPL) direclty from SAN-attached tape.This brings System i to the 21st century for Business Continuity options.
Virtual I/O Server (VIOS)
[VirtualI/O Server] has been around for System p machines, but now available on System i as well. This allows multiplelogical partitions (LPARs) to access resources like Ethernet cards and FCP host bus adapters. In the case of storage, the VIOS handles the 520-byte to 512-byte conversion, so that i5/OS systems can now read and write to standard FCP devices like the IBM System Storage DS4800 and DS4700 disk systems.
IBM System Storage DS4000 series
Initially, we have certified DS4700 and DS4800 disk systems to work with i5/OS, but more devices are in plan.This means that you can now share your DS4700 between i5/OS and your other Linux, UNIX and Windowsservers, take advantage of a mix of FC and SATA disk capacities, RAID6 protection, and so on.
To call [IBM PowerVM] the "VMware for the POWER architecture" would not do it quite justice. In combination with VIOS, IBM PowerVM is able to run a variety of AIX, Linux and i5/OS guest images.The "Live Partition Mobility" feature allows you to easily move guest images from one system to another, while they are running, just like VMotion for x86 machines.
And while we are on the topic of x86, PowerVM is also able to represent a Linux-x86 emulation base to run x86-compiled applications. While many Linux applications could be re-complied from source code for the POWER architecture "as is", others required perhaps 1-2 percent modification to port them over, and that was too much for some software development houses. Now, we can run most x86-compiled Linux application binaries in their original form on POWER architecture servers.
BladeCenter JS22 Express
The POWER6-based [JS22 Express blade] can run i5/OS, taking advantage of PowerVM and VIOS to access all of the BladeCenterresources. The BladeCenter lets you mix and match POWER and x86-based blades in the same chassis, providing theultimate in flexibility.
While many are just becoming familiar with the end-user interfaces of Web 2.0, from blogs and wikis to FaceBook and FlickR, fewer may be familiar with the "information infrastructure" of servers and storagebehind the scenes.
Last year, I bought an XO laptop under the One Laptop Per Child [OLPC] foundation's Give-1-Get-1 program and posted my impressions on this blog. One in particular, my post[Printingon XO laptop with CUPS and LPR] showed how to print from the XO laptop over to a network-attached printer.This caught the attention of the OLPC development team, who asked me tohelp them with another project as a volunteer. Before accepting, I had to learn what skills they were really looking for, especially since I do notconsider myself an expert in neither printing nor networking.
(Unlike a regular 9-to-5 job where most people just try to look busy for eight hours a day, doingvolunteer work means being ready to ["roll up your sleeves"] and actuallyaccomplish something. This applies to any kind of volunteer work, from hammering nails for [Habitat for Humanity] to sorting cans at the [Community Food Bank].Best Buy uses the phrase "Results Oriented Work Environment" [ROWE] to describetheir latest program, modeled in part after the mobile workforce policies of Web2.0-enlightened companiesIBM and Sun, but that is perhaps a topic for another blog post!)
Apparently, to support a school full of students with XO laptops, it would be nice to have a few serversthat provide support to manage the class lesson plans, make reading materials and other content available,and keep track of results. What they need is an "information infrastructure"! They decided on two specific servers:
School Server -- this would run a popular class management system called [Moodle]
Library Server -- a server for a digital library collection, based on Fedora Commons[16-minute video]
In keeping with OLPC philosophy to use free and open source software[FOSS], both servers are based on the [LAMP] platform. LAMP is an acronym for thecombined software bundle of Linux, Apache, MySQL and a Programming language like PHP. The "XS" team working onthe school server wanted me to build a LAMP server and install Moodle to help test the configuration, determinewhat other software is required, and perhaps develop a backup/recovery scenario. Basically, they needed someone with Linux skills to put some hardware and software together.
(I am no stranger to Linux. Back in the 1990s, I was part of the Linux for S/390 team, led the effort to createthe infamous "compatible disk layout" (CDL) that allows z/OS to access ESCON and FICON-attached Linux volumes,took my LPI certification exam, and led a team to validate FCP drivers for our disk and tape storage systems. For an IBMer to volunteer foran Open Source community project, you have to take an "open source" class and get management approval to reviewfor any possible "conflicts of interest". I got this all taken care of, and accepted to help the XS team.)
Building a test environment is similar to baking a cake. You have a recipe, utensils, and ingredients. Here'sa bit of description of each of the ingredients:
Like Windows, the Linux operating system comes in different flavors to run on handhelds, desktops and servers. For servers, IBM tends to focus on Red Hat Enterprise Linux (RHEL) and SUSE Linux Eneterprise Server (SLES). However, the XS team decidedinstead to use [Fedora 7], a community-supported version from Red Hat. Earlier versions of Fedora were known as "Fedora Core", but apparently with version 7, the word "Core" has been dropped. Fedora 7 can be used in either desktop or server mode.
[Apache] is web server software, and half of all web servers on the internet use it. It competes head-on against Micorosofts Internet Information Services (IIS) serverprovided in Windows 2003. The Apache name is partly from thefact that its origins were "a patchy" variant of the NCSA HTTPd 1.3 codebase. Thepopular [IBM HTTP Server] is poweredby Apache, with added support to the rest of the IBM WebSphere software portfolio. The XS team chose Apache v2as the web server platform.
[MySQL] is a relational database management system (RDBMS) software, similar to commercial products like IBM DB2 Universal Database, Oracle DB, or Microsoft SQL Server. The SQL stands for Structured Query Language, developed by IBM in the early 1970s as a standard languageto update and query database tables. MySQL comes in two flavors, MySQL Enterprise for commercial use, and MySQLCommunity, which is community-supported. There are over 10 million instances of MySQL running websites on the internet, which helps explain why Sun Microsystems agreed to acquire MySQL AB company last month.The XS team decided on MySQL 5.0 as the database platform.
To make HTML pages dynamic, including the possibility to add or query database contents, requires programming.A variety of web scripting languages were developed, all starting with the letter "P" to claim to be the programming part of the LAMP platform, including [PHP], Perl, and Python. Later, new programming language frameworks have been developed that do not start with the letter "P", like [Ruby on Rails]. PHP is short for PHP: Hypertext Preprocessor which explains that it pre-processes HTML during web serving,looking for special tags indicating PHP code, allowing programming logic to insert HTML content, such as information extracted from a database.While Python is the language that runs the Sugar interface on the XO laptops, the XS team decided onPHP v5 as the programming language for the server.
As for utensils, you only need a few utilities
A simple text editor: I go old-school and use the classic "vi" (to learn this editor, see the["Cheat Sheet" method] on IBM Developerworks)
secure socket shell (SSH): this allows you to access one server from another
browser access to the internet: when you encounter problems, get error messages, or whatever, it pays to know how to search for things with Google
As for a recipe, the Moodle website spells out some unique details and parameters. For the base LAMP platform,I chose to follow the book [Fedora 7 Unleashed] that has specific chapters on setting up SSH, Apache, MySQL, PHP, Squid and so on. The resultingconfiguration looks like this:
Here were the sequence of events:
I took an old PC that I wasn't using anymore, backed up the Windows system, and installed Linux on top. Thebook above had a Fedora 7 DVD on the back jacket, but I used the [OLPC LiveCD] that had some values pre-configured.
Set the IP address static. I set mine to 192.168.0.77 which nobody sees except my other systems.
My school server is "headless" which means it does not have its own keyboard, video or mouse. It also runs only to Linux run level 3, command line interface only, no graphics.I was able toshare using a KVM switch], but this meant having to remember something on one screen while I was switching over to the other. My Windows XP system has mybrowser connection to the internet to follow instructions or read error messages, so I need that up all thetime. To get around this, on my Windows XP system,I generated SSH public and private keys, copied the public key over to my new Linux system, and used [OpenSSH for Windows] to connect over. Now, on one screen,I have my Windows XP Firefox browser, and a separate command line window that is accessing my Linux schoolserver.
With SSH up and running, I can now use "vi" to edit files, and issue commands to install or activatethe remaining software. First up, Apache. I got this working, and from Windows XP, verified that going to"http://192.168.0.77" showed the Apache test screen.
I installed PHP, and tested it with a simple short index.php file.
I installed MySQL, setup the base "installation databases", and created a test database. Here is whereyou might want to set a password for the MySQL root user, but I chose to do that later for now.
I installed Moodle. It was smart enough to check that Apache, PHP, and MySQL were operational, andapparently I missed a few special "PHP" modules that had to be linked in. I was able to find them, downloadthem, and get them installed.
I brought up Moodle, created a "class category" of SCIENCE and a new class "Chemistry 101", and it allworked.
I also activated Squid, which is a web proxy cache server that stores web pages for faster access.
Another idea was to activate Samba, to provide CIFS file and print sharing, but I decided to put this off.
I got all of this done last Saturday, start to finish. Now the fun begins. We are going to run throughsome tests, document the procedures, and try to get a system up and running in a remote school in Nepal. Fornow, I have only one XO laptop to simulate what the student sees, and one laptop that can represent eithera teacher's Windows-based laptop, or run QEMU and emulate a second XO laptop.For tuning, I might go through the procedures mentioned on IBM Developerworks "Tuning LAMP"[Part 1, Part 2,Part 3].
For those in the server or storage industry that need to understand Web 2.0 information infrastructure better,building a LAMP server like this can be quite helpful.
Are you covering the business impact of the internet failure across Asia, the Middle East and North Africa? The outage has brought business in those regions to a standstill. This disaster shines a direct spotlight on the vulnerability of technology and serves as a reminder of the ever increasing importance of protecting business critical information.
Disaster recovery needs to be a critical element of every technology plan. We don’t yet know the financial impact of this wide spread internet failure, but the companies with disaster recovery plans in place, were likely able to failover their entire systems to servers based in other regions of the world.
When I first heard of this outage, I am thinking, so a few million people don't have access to FaceBook and YouTube, what's the big deal? We in the U.S.A. are in the middle of a [Hollywood writer's strike] and don't have fresh new television sitcoms to watch! Yahoo News relays the typical government's response:[Egypt asks to stop film, MP3 downloads during Internet outage], presumably so that real business can take priority over what little bandwidth is still operational. Fellow IBM blogger "Turbo" Todd Watson pokes fun at this, in his post[Could Someone Please Get King Tutankhamun On The Phone?].Like us suffering here in America, perhaps our brothers and sisters in Egypt and India may getre-acquainted with the joys of reading books.
However, the [Internet Traffic Report-Asia] shows how this impacted various locations including: Shanghai, Mumbai, Tokyo, Tehran, and Singapore. In some cases, you have big delays in IP traffic, in other cases, complete packet loss, depending on where each country lies on the["axis of evil"].This is not something just affecting a few isolated areas, the impact is indeed worldwide. This would be a goodtime to talk about how computer signals are actually sent.
DWDM takes up to 80 independent signals, converts each to a different color of light, and sends all the colors down a single strand of glass fiber. At the receiving end, the colors are split off by a prism,and each color is converted back to its original electrical signal.
Similar DWDM, but only eight signals are sent over the glass fiber. This is generally cheaper, becauseyou don't need highly tuned lasers.
Wikipedia has a good article on [Submarine Communications Cable],including a discussion on how repairs are made when they get damaged or broken.It is important to remember that lost connectivity doesn't mean lost data, just lack of access to the data. Thedata is still there, you just can't get to it right now. For some businesses, that could be disruptive to actualoperations. In other cases, it means that backups or disk mirroring is suspended, so that you only have yourlocal copies of data until connectivity is resumed.
When two cables in the Mediterranean were severed last week, it was put down to a mishap with a stray anchor.
Now a third cable has been cut, this time near Dubai. That, along with new evidence that ships' anchors are not to blame, has sparked theories about more sinister forces that could be at work.
For all the power of modern computing and satellites, most of the world's communications still rely on submarine cables to cross oceans.
It gets weirder. In his blog Rough Type, Nick Carr's[Who Cut the Cables?] reportsnow a fourth cable has been cut, in a different location than the other two cable locations. If the people cuttingthe cables are looking to see how much impact this would have, they will probably be disappointed. Nick Carrrelates how resilient the whole infrastructure turned out to be:
Though India initially lost as much as half of its Internet capacity on Wednesday, traffic was quickly rerouted and by the weekend the country was reported to have regained 90% of its usual capacity. The outage also reveals that the effects of such outages are anything but neutral; they vary widely depending on the size and resources of the user.
Outsourcing firms, such as Infosys and Wipro, and US companies with significant back-office and research and development operations in India, such as IBM and Intel, said they were still trying to asses how their operations had been impacted, if at all.
Whether it is man-made or natural disaster, every business should have a business continuity plan. If you don't have one, or haven't evaluated it in a while, perhaps now is a good time to do that. IBM can help.
According to Gartner data (from 2005!), host-based storage accounts for 34 percent of the overall market for external storage, with the remaining 66 percent going to "fabric-attached" (network) storage, expect this share to grow from 66 percent to 77 percent by 2007.What is the current reality? SAN vs. NAS, FC vs iSCSI?
IBM subscribes to a lot of data from different analysts, they all have their methods for collecting this data, from taking surveys of customers to reviewing financial results of each vendor. While theymight not agree entirely, there are some common threads that lead one to believe they represent "reality". Hereare some numbers from an IDC December 2007 report:
Worldwide Disk Storage
While the 32/68 split is similar to the 34/66 split you mentioned before, you can see that external growth isgrowing faster, so internal host-based storage will drop to 25 percent by 2011, with external storage growing to 75 percent, very close to the 77 predicted. Looking at just the externaldisk storage, there are basically three kinds: DAS (direct cable attachment), NAS (file level protocols suchas NFS, CIFS, HTTP and FTP), and SAN (block-level protocols like FC, iSCSI, ESCON and FICON):
Worldwide External Disk Storage
At these rates, fabric-attached (SAN and NAS) will continue to dominate the storage landscape.Looking more closely now at the block-oriented protocols.
Worldwide External Disk Storage
Fibre Channel (FC)
At these rates, iSCSI will overtake FC by 2011. IBM System Storage N series, DS3300 and XIV Nextraall support iSCSI attachment.
Jon Toigo over at DrunkenData offers some additional data from ex-STKer:[Fred Moore Outlook on Storage 2008]. I met Fredat a conference. He had left STK back in 1998, and started his own company called Horison. NeitherJon nor Fred cite the sources of his statistics, but the following comment leads me to assume hehasn't been paying attention closely to the tape market:
With the demise of STK, who will be the leader in the tape industry?
Depending on how old you are, you might remember exactly where you were when a significant eventoccurred, for example the[Space Shuttle Challenger]explosion. For many IBMers, it was the day our friends at Sun Microsystems announced they were [puttingour lead tape competitor out of its misery]. I was in New York that day, but there was still someconfetti on the floor in the halls of the IBM Tucson lab when I got home a few days later. IBM hasbeen the number one market share leader in tape for over the past four years.
Last July, IBM and EMC traded blog postings over SPC-1 benchmark results. Fellow EMC bloggerChuck Hollis wrote his post [Does Anyone Take The SPC Seriously?]. Here is an excerpt:
I think most storage users have figured this out. We've never done an SPC test, and probably will never do one. Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it.
I responded with [Getting Under EMC Skin], and then followed up with a series explaining IBM SVC and SPC benchmarks here:
So what is the good news?Yesterday, our friends at NetApp took up Chuck's challenge and posted results on their FAS3040 as well as their EMC CLARiiON devices. IBM sells the FAS3040 under the name IBM System Storage N5300 disk system. Knowing that NetApp maintains excellent performance when it is doing point-in-time copies, NetApp ran both with and without on both boxes. I include DS4700 and DS4800 as well for comparison purposes, but only have them without FlashCopy running.
NetApp FAS3040 (IBM N5300)
NetApp FAS3040 (IBM N5300)
EMC CLARiiON CX3-40
IBM DS4700 Express
EMC CLARiiON CX3-40
One would expect some performance degradation with a box running point-in-time copies at the same time it is reading and writing data, but NetApp/IBM N5300 does not degrade by much, but EMC's drops a significant amount.
So what is the bad news? Last October, I welcomed HDS USP-V to the [Super High-End Club], but now we need to invite Texas Memory Systems as well.In 2006, I posted [Hybrid, Solid State and the future of RAID], and poked fun at Texas Memory Systems using the slogan "World's Fastest Storage", which at the time that honor belonged to IBM SAN Volume Controller instead.The VP of Texas Memory Systems, Woody Hutsell, explained the only reason their solid-state disk system, RAMSAN-320, didn't have faster results is that they didn't have the fastest IBM server to run against it. It may not surprise you that nearly everyone's SPC benchmarks use IBM servers because IBM has the fastest servers as well. I didn't have a million-dollar System p UNIX server to send Woody for this, but it looks like they have finally gotten one, and a new RAMSAN-400 device, as they have posted their latest results.
Texas Memory Systems RAMSAN-400
IBM SAN Volume Controller 4.2
EMC doesn't publish numbers for their Symmetrix box, despite their announcement of faster SSD drives. They claim that SSD drives make their overall disk system performance faster, but without SPC benchmarks, we will never know. If you have a Symmetrix, this YouTube video may help you decide where it belongs:
IBM came out with their latest "5 in 5". These are five predictions for technologies that will havean impact over the next five years, summarized on 5 pages. Before I give my take on this year's set,here is a quick recap of[Last Year's 5 in 5]:
3-D representations of the human body to improve health care
This prediction is based on the idea that most medical mistakes result from lack of informationabout the patient. A 3-D avatar of the patient would allow the doctor to click on the section ofthe body, and this would trigger retrieval of patient records, relevant X-rays, MRI images, and so on.For example, IBM System Storage Grid Medical Archive Solution (GMAS) provides the storage that wouldallow any doctor to access these records, even if the image was taken at a different facility.
Unfortunately, this prediction only applies to patients who can actually afford to see a doctor. Apparently,no amount of technology, no matter how cool it is, can convince governments to make health care somethingeveryone has access to. Michael Moore has done a good job explaining this in his film documentary [Sicko].
Digital passport for food
Using RFID tags and second generation barcodes, you will have access to details of a food's origin,transportation conditions, and impact to the environment. Much of this information is already gathered,just not stored in a database accessible to the consumer.
Last year, the term "locavore" was the2007 Word of the Year for the Oxford American Dictionary, referring to people who limit what they eatto food produced within a certain radius, from family farms and locally-owned businesses.Here is an excerpt from a [Locavores] website:
Our food now travels an average of 1,500 miles before ending up on our plates. This globalization of the food supply has serious consequences for the environment, our health, our communities and our tastebuds.
Certainly, I am all for selling storage capacity to the food industry to help store vasts amount ofinformation for this, and certainly some people will be able to make smarter decisions based on thisinformation. This is not the first time this idea came up. The U.S. Food and Drug Administration introduced [nutrition labeling requirements] on thehope that people would choose more healthier foods. Despite this, people still opt for white bread, iceberg lettuce, and processed meats, so possibly having more information about where food comes from, and how it was transported, may not mean much to some consumers.
Technology to manage your own carbon footprint
"Smart energy" technologies allow you to walk the talk, by managing your own carbon footprint inyour home. For example, if you forgot to turn off the heat or air conditioner before leaving thehouse on your commute to work, your home would call your mobile phone, so that you can turn aroundand go back and correct that mistake. Better yet, IBM is working with others to provide web-enabledelectric meters that would allow you to turn off systems from work or cell phone browser.
Of course, such technology already exists for the data center. IBM Systems Director Active EnergyManager (AEM) allows you to monitor the actual usage of your servers and storage devices, and insome cases make adjustments to control energy consumption. This can feed into the IBM TivoliUsage and Accounting Manager software to incorporate energy usage as part of the charge-backcalculations. See the [IBM Press Release] formore details.
Cars that drive themselves
Not only will cars that drive themselves reduce the number of drunk-driving accidents, it canalso help reduce congestion in big cities, by routing traffic to different directions, based onGPS and presence-aware technologies. Stockholm (Sweden) has already reduced peak hour traffic by 20 percentusing this approach.
While I admire the concept, cars are perhaps the least energy-efficient mode of transportation.Often, a family can only afford a single vehicle, and it is purchased based on the worst-case scenario.A friend of mine has only two children, but a sever-person mini-van that gets only 17 MPG. Why suchan energy-inefficient vehicle? Because she occasionally drives her daughter and her friends tosoccer practice, and that represents the worst-case scenario, minimizing the parent/child ratio. Theother 99 percent of the time, she is driving by herself, or with one child, and consuming a lot ofgasoline in the process.
A better approach would be to find technology that connects airports, trains, buses and light rail forpublic transportation to greatly reduce the need to drive a car in the first place.
The idea that a family can have only one vehicle plays in the storage arena as well. Larger companiescan afford to have different storage for different workloads. The IBM System Storage DS8000 high-end disk system for their large OLTP anddatabase workloads, an XIV Nextra for their Web 2.0 storage needs, DR550 to hold their compliance data,and so on. Smaller companies are often tasked to find a single solution for all their needs, andfor them, IBM offers the IBM System Storage N series, providing a "unified storage" platform.
Increased dependence on cell phones
Before the cell phone, the last don't-leave-home-without-it technology most of us carried was the credit card. Now, IBM predicts that we will be even more dependent on our cell phones, becoming our banker, ticket broker, and shopping buddy.For example, you could use your cell phone to take a picture of a shirt at the mall, and it will then show you what youwould look like wearing that shirt, on a 3-D avatar representation of yourself, or perhaps your spouse, and getinformation on what discounts are available, or where else the shirt is being offered.
None of this example actually uses the "phone" part of the cell phone, however the cell phone is one device thatnearly everyone carries, so it becomes the development platform for all other technologies to be based on.
The common theme running through these is that it can be helpful to store more information than we do today,provided we make it accessible to the people who need it to make better decisions.
Last week, I got the following comment from Bob Swann:
I am looking for the IBM VM Poster or a picture of the IBM VM "Catch the Wave"
Do you know where I might find it?
Well, Bob, I made some phone calls. The company that published these posters no longer exists, butI found a coworker at the Poughkeepsie Briefing Center who still had the poster on his wall, and he was kind enough to take a picture of it for you.
VM: The Wave of the Future (click thumbnail at left to see larger image)
Some may recognize this as a [mash-up] using as a base the famous Japanese 10-inch by 15-inch block print[The Great Wave off Kanagawa] byartist [Katsushika Hokusai]. I had this as my laptop'swallpaper screen image until last year when I was presenting in Kuala Lumpur, Malaysia. I was told that it reminded people about the horrible tsunami caused by the [Indian Ocean earthquake] back in 2004.I was actually scheduled to fly the last week of December 2004 to Jakarta, Indonesia, but at the last minute ourclient team changed plans. I would have been on route over the Pacific ocean when the tsunami hit, and probably stranded over there for weeks or months until the airports re-opened.
The Wave theme was in part to honor the IBM users group called World Alliance VSE VM and Linux (WAVV) which is havingtheir next meeting [April 18-22, 2008] in Chattanooga, Tennessee. I presentedat this conference back in 1996 in Green Bay, Wisconsin, as part of the IBM Linux for S/390 team. It started onthe Sunday that Wisconsin switched their clocks for [DaylightSaving Time], and the few of us from Arizona or other places that don't both with this, all showed up forbreakfast an hour early.
When I was in Australia last year, I was told the wave that sports fans do, by raising their hands in coordinatedsequence, was called the [Mexican Wave]in most other countries. When I was there, Melbourne was trying to outlaw this practice at their cricket matches.
The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!)
I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below.
While EMC bloggers garnered media attention last year pointing out the faulty mathematics from HDS, an astute reader pointed me to EMC's own [DMX-4 specification sheet],updated for its 1TB SATA disk.I've chosen just the minimum and maximum number of drives RAID-6 data points for non-mainframe platforms:
In the first two rows, the numbers appear as expected. For example, 96 drives would be 12 sets of 6+2 RAID ranks, meaning 72 drives' worth of data, so nearly 36TB for 500GB drives, and nearly 72TB for 1TB drives. With 14+2 RAID-6, thenyou would have 84 drives' worth of data, so 42TB and 84TB respectively match expectations.
Where EMC appears miscalculating is having 20x more drives, as the numbers don't match up. For 1920 drives inRAID-6, you would expect 20x more usable capacity than the 96 drive configurations. For 6+2 configurations, one would expect 720TB and 1440TB respectively. For 14+2 configurations, one wouldexpect 840TB and 1680TB, respectively.
Perhaps EMC DMX-4 can't address more than 600TB for the entire system? Does EMC purposely limit the benefitsof these larger drives? It does question why someone might go from 500GB to 1TB drives, if the maximum configuration only gives about 40TB more capacity.Fellow IBM blogger Barry Whyte questioned the use of SATA in an expensive DMX-4 system, in his post[One Box Fits All - Or Does It], and now perhaps there are good reasons to question 1TB from a capacityperspective as well.
Today is Tuesday, a good day for announcements and good news!
This week I am in Guadalajara, Mexico, and the focus in Mexico is Small and Medium sized Business (SMB). SmallBusinessComputing.COM put out their [2008 Awards: The Absolute Best in Small Business], and IBM disk and server systems were recognized. Here is an excerpt:
Network Storage As companies expand, so does the data, and often at an alarming rate. Adding dedicated storage to your network can ease both system performance and efficiency woes, making your work life a bit easier.
This year, 42 percent our readers cast their lot with the [IBM System Storage DS3400]. The $6,495 system supports 12 hard disk drives for capacity of up to 3.6 terabytes a good match for tasks such as managing databases, e-mail and Web serving.
Last year's winner, NetApp, takes a very respectable runner-up slot for the NetApp Store Vault S300, a $3,000 storage appliance that offers security, scalability, data protection and simplified management.
Also, IBM's SMB departmental machine, the [System i515 Express] was named runner-up for servers.
This week I'm in beautiful Guadalajara, Mexico teaching at our[System Storage Portfolio Top Gun class].We have all of our various routes-to-market represented here, including our direct sales force, our technicalteams, our online IBM.COM website sales, as well as IBM Business Partners.Everyone is excited over last week's IBM announcement of [4Q07 and full year 2007 results], which includesdouble-digit growth in our IBM System Storage business, led by sales of our DS8000, SAN Volume Controller and Tapesystems. Obviously, as an IBM employee and stockholder, I am biased, so instead I thought I would provide someexcerpts from other bloggers and journalists.
But what was striking in the company’s conference call on Thursday afternoon was the unhedged optimism in its outlook for 2008, given the strong whiff of recession fear elsewhere.
The questions from Wall Street analysts in the conference call had a common theme. Why are you so comfortable about the 2008 outlook? Now, that might just be professional churlishness, since so many of them have been so wrong recently about I.B.M. Wall Street had understandably thought, for example, that I.B.M.’s sales to financial services companies — the technology giant’s largest single customer category — would suffer in the fourth quarter, given the way banks have been battered by the mortgage credit crunch.
But Mr. Loughridge said that revenue from financial services customers rose 11 percent in the fourth quarter, to $8 billion. The United States, he noted, accounts for only 25 percent of I.B.M.’s financial services business.
The other thing that seems apparent is how much I.B.M.’s long-term strategy of moving up to higher-profit businesses and increasingly relying on services and software is working. Its huge services business grew 17 percent to $14.9 billion in the quarter. After the currency benefit, the gain was 10 percent, but still impressive. Software sales rose 12 percent to $6.3 billion.
Looking at IBM's business segments, it can be seen that they offer far more coverage of the technology space that those of the typical tech company:
IBM is just so big and diversified that there is little comparison between it and most other tech companies. IBM is a member of an elite group of companies like Cisco Systems (CSCO), Microsoft (MSFT), Oracle (ORCL) or Hewlett-Packard (HPQ).
IBM's wide international coverage and deep technological capabilities dwarf those of most tech companies. Not only do they have sales organizations worldwide but they have developers, consultants, R&D workers and supply chain workers in each geographic region. Their product mix runs from custom software to packaged enterprise software, hardware (mainframes and servers), semiconductors, databases, middleware technology, etc., etc. There are few tech companies that even attempt to support that many kinds and variations of products.
As color on the fourth quarter earnings announcement, there are a couple of observations that I would like to make. The first one speaks to IBM's international prowess. The company indicated that growth in the Americas was only 5%. International sales were a primary driver of IBM's good results. As an insight on the difference between IBM and most other tech companies, it is clear that nowadays, a tech company that isn't adept at selling internationally is going to be in trouble.
Terrific performance in a terrific year - no doubt a result of its strong global model. IBM operates in 170 countries, with about 65% of its employees outside US and about 30% in Asia Pacific. For fiscal 2007, revenues from Americas grew 4% to $41.1 billion (42% of total revenue), [EMEA] grew 14% to $34.7 billion (35%of total revenue), and Asia-Pacific grew by 11% to $19.5 billion (19.7% of total revenue). IBM sees growth prospects not just in [BRIC] but also countries like Malaysia, Poland, South Africa, Peru, and Singapore.
Thus far 2008–all two weeks of it–hasn’t been a pretty for the tech industry. Worries about the economy prevail. And even companies that had relatively good things to say like Intel get clobbered. It’s ugly out there–unless you’re IBM.
I am sure there will be more write-ups and analyses on this over the next coming weeks, and others will probably waituntil more tech companies announce their results for comparison.
Fellow Blogger BarryB mentions "chunk size" in his post [Blinded by the light],as it relates to Symmetrix Virtual Provisioning capability. Here is an excerpt:
I mean, seriously, who else but someone who's already implemented thin provisioning would really understand the implications of "chunk" size enough to care?
For those of you who don't know what the heck "chunk size" means (now listen up you folks over at IBM who have yet to implement thin provisioning on your own storage products), a "chunk" is the term used (and I think even trademarked by 3PAR) to refer to the unit of actual storage capacity that is assigned to a thin device when it receives a write to a previously unallocated region of the device.
For reference, Hitachi USP-V uses I think a 42MB chunk, XIV NEXTRA is definitely 1MB, and 3PAR uses 16K or 256K (depending upon how you look at it).
Thin Provisioning currently offered in IBM System Storage N serieswas technically "implemented" by NetApp, and that the Thin Provisioning that will be offered in our IBM XIV Nextrasystems will have been acquired from XIV. Lest I remind you that many of EMC's products were developed by other companies first, then later acquired by EMC, so no need for you to throw rocks from your glass houses in Hopkington.
"Thin provisioning" was first introduced by StorageTek in the 1990's and sold by IBM under the name of RAMAC Virtual Array (RVA). An alternative approach is "Dynamic Volume Expansion" (DVE). Rather than giving the host application a huge 2TB LUN but actually only use 50GB for data, DVE was based on the idea that you only give out 50GB they need now, but could expand in place as more space was required. This was specifically designed to avoid the biggest problem with "Thin Provisioning" which back then was called "Net Capacity Load" on the IBM RVA, but today is now referred to as "over-subscription". It gave Storage Administrators greater control over their environment with no surprises.
In the same manner as Thin Provisioning, DVE requires a "chunk size" to work with. Let's take a look:
On the DS4000 series, we use the term "segment size", and indicate that the choice of a segment size can have some influence on performance in both IOPS and throughput. Smaller segment sizes increase the request rate (IOPS) by allowing multiple disk drives to respond to multiple requests. Large segment sizes increase the data transfer rate(Mbps) by allowing multiple disk drives to participate in one I/O request. The segment size does not actually change what is stored in cache, just what is stored on the disk itself.It turns out in practice there is no advantage in using smaller sizes with RAID 1; only in a few instances does this help with RAID-5 if you can writea full stripe at once to calculate parity on outgoing data. For most business workloads, 64KB or 128KB are recommended. DVE expands by the same number of segments across all disks in the RAID rank, so for example in a 12+P rank using 128KB segment sizes, the chunk size would be thirteen segments, about 1.6MB in size.
SAN Volume Controller
On the SAN Volume Controller, we call this "extent size" and allow it to be various values 64MB to 512MB. Initially,IBM only managed four million extents, so this table was used to explain the maximum amount that could be managedby an SVC system (up to 8 nodes) depending on extent size selected.
IBM thought that since we externalized "segment size" on the DS4000, we should do the same for the SANVolume Controller. As it turned out, SVC is so fast up in the cache, that we could not measure any noticeable performance difference based on extent size. We did have a few problems. First, clients who chose 16MB andthen grew beyond the 64TB maximum addressable discovered that perhaps they should have chosen something larger.Second, clients called in our help desk to ask what size to choose and how to determine the size that was rightfor them. Third, we allowed people to choose different extent sizes per managed disk group, but that preventsmovement or copies between groups. You can only copy between groups that use the same extent size. The generalrecommendation now is to specify 256MB size, and use that for all managed disk groups across the data center.
The latest SVC expanded maximum addressability to 8PB, still more than most people have today in their shops.
Getting smarter each time we introduce new function, we chose 1GB chunks for the DS8000. Based on a mainframebackground, most CKD volumes are 3GB, 9GB, or 27GB in size, and so 1GB chunks simplified this approach. Spreadingthese 1GB chunks across multiple RAID ranks greatly reduced hot-spots that afflict other RAID-based systems.(Rather than fix the problem by re-designing the architecture, EMC will offer to sell you software to help you manually move data around inside the Symmetrix after the hot-spot is identified)
Unlike EMC's virtual positioning, IBM DS8000 dynamic volume expansion does work on CKD volumes for our System z mainframe customers.
The trade-off in each case was between granularity and table space. Smaller chunks allow finer control on the exact amount allocated for a LUN or volume, but larger chunks reduced the number of chunks managed. With our advanced caching algorithms, changes in chunk size did not noticeably impact performance. It is best just to come up with a convenient size, and either configure it as fixed in the architecture, or externalize it as a parameter with a good default value.
Meanwhile, back at EMC, BarryB indicates that they haven't determined the "optimal" chunk size for their newfunction. They plan to run tests and experiments to determine which size offers the best performance, and thenmake that a fixed value configured into the DMX-4. I find this funny coming from the same EMC that won't participate in [standardized SPC benchmarks] because they feel that performance is a personal and private matter between a customer and their trusted storage vendor, that all workloads are different, and you get the idea. Here's another excerpt:
Back at the office, they've taking to calling these "chunks" Thin Device Extents (note the linkage back to EMC's mainframe roots), and the big secret about the actual Extent size is...(wait for it...w.a.i.t...for....it...)...the engineers haven't decided yet!
That's right...being the smart bunch they are, they have implemented Symmetrix Virtual Provisioning in a manner that allows the Extent size to be configured so that they can test the impact on performance and utilization of different sizes with different applications, file systems and databases. Of course, they will choose the optimal setting before the product ships, but until then, there will be a lot of modeling, simulation, and real-world testing to ensure the setting is "optimal."
Finally, BarryB wraps up this section poking fun at the chunk sizes chosen by other disk manufacturers. I don't knowwhy HDS chose 42MB for their chunk size, but it has a great[Hitchiker's Guide to the Galaxy]sound to it, answering the ultimate question to life, the universe and everything. Hitachi probably went to theirDeep Thought computer and asked how big should their "chunk size" be for their USP-V, and the computer said: 42.Makes sense to me.
I have to agree that anything smaller than 1MB is probably too small. Here's the last excerpt:
Now, many customers and analysts I've spoken to have in fact noted that Hitachi's "chunk" size is almost ridiculously large; others have suggested that 3PAR's chunks are so small as to create performance problems (I've seen data that supports that theory, by the way).
Well, here's the thing: the "right" chunk size is extremely dependent upon the internal architecture of the implementation, and the intersection of that ideal with the actual write distribution pattern of the host/application/file system/database.
So my suggestion to EMC is, please, please, please take as much time as you need to come up with the perfect"chunk size" for this, one that handles all workloads across a variety of operating systems and applications, from solid-state Flash drives to 1TB SATA disk. Take months or years, as long as it takes. The rest of the world is in no hurry, as thin provisioning or dynamic volume expansion is readily available on most other disk systems today.
Maybe if you ask HDS nicely, they might let you ask their computer.
This week was the 2008 MacWorld conference. I thought I would reflect on some of the storage related aspects of the products mentioned by Steve Jobsin his Keynote address.Many were updated version of products introduced last year's MacWorld. (In case you forgot whatthose were, here ismy post that covered [MacWorld 2007]).
(Disclaimer: IBM has a strong working relationship with Apple, and manufacturers technology used in someof Apple's products. I own both an Apple iPod as well as an Apple G4 Mac Mini. IBM supports its employees usingApple laptops instead of Windows-based ones for work, and IBM has developed software that runs on Apple's OS X.Apple is kind enough to extend its "employee discount prices" to IBM employees.)
In the first 90 days of its release, Apple sold 5 million copies, representing 19 percent of Mac users. I am stillone of the 81 percent still using 10.4 Tiger, the previous level. My Mac Mini is based on G4 POWER processor, and upgrading is on my [Someday/Maybe] list. I am not taking sides in the [OS X vs. Windows vs. Linux religious debate]; I use all three.
The key storage-related feature of Leopard is their backup software Time Machine, and Steve Jobs announceda companion product called Time Capsule that would serve as the external backup disk wirelessly, over 802.1nWi-Fi. For many households, backup is either never done, or done rarely, so any help to simplify and relieve theburden is welcome.
Time Capsule comes in 500GB and 1TB SATA disk capacities, which Steve Jobs called "server-grade". What about a 750GB model? Looks like Apple followed EMC'sexample and went straight to 1TB instead. After EMC failed to deliver 750GB drives in 2007 that they [promised back in July], EMC blogger Chuck Hollis explains in his post[Enterprise Storage Strikes Back!]:
So there's something in the EMC goodie bag as well for you -- the availability of the new 1TB disk drives you've been hearing about. We skipped the 750GB drive and went right to the 1TB drive.
Apple iPhone and iPod Touch
In the first 200 days, Apple has sold 4 million phones, and has garnered nearly 20 percent of the smart phone market share. New features include a GPS-like location feature that uses [triangulation] with cell phone towers and Wi-Fi hotspotsto determine where you are located.
I covered last year's introduction of the iPhone in my post on [Convergence].All of the features he presented were software updates to the existing 8GB and 16GB models. No new modelswith larger storage were introduced.
I am a T-mobile customer, so am out of luck until either (a) Apple unlocks their phones from the AT&T network, or(b) Apple signs an agreement with T-mobile in the USA. I reviewed the various hacks to unlock iPhones last year, but was not interested in losing official warranty or future software support.
The iPod Touch is an interesting alternative. It is basically an iPhone with the cell-phone features disabled, whichgives you Wi-Fi over the Safari browser, music, videos, and so on. Steve Jobs mentioned enhanced software updates for this as well. The iPod Touch comes in the same 8GB and 16GB sizes as the iPhone.
AppleTV and iTunes
Steve Jobs indicated that they have sold over 4 billion songs over iTunes, 125 million TV shows, and 7 million movies.He announced that now iTunes would allow for movie rentals, with the option to see them within 30 days, but once you started watching a movie, you have 24 hours to finish. I found it interesting that he said rentals were to reduce space on your hard drive, versus outright purchase of movie content.
In a rare concession, Steve admitted that the original AppleTV misunderstood the marketplace. The original AppleTV allowed you to view pictures and listen to music through your television, but people wanted to view movies. Thesoftware upgrade would allow this, using the iTunes rental model above, as well as watch video podcasts and over 50 million videos posted on YouTube.
Some television-related stats from [z/Journal] were quite timely. The older non-digital TVs could be usedwith the AppleTV and gaming systems like Nintendo Wii.
33 percent of U.S. households do not know what to do with (their older) TVs after digital switch (Feb 2009)
69 percent of Americans think PCs are more entertaining than TV
Rather than try to fight peer-to-peer website piracy, Apple cleverly decided to compete head-to-head against it. This iswell summarized in Matt Mason's 6-minute video [The Pirate's Dilemma]. Eleven major movie studios are on board with Apple's movie rental plans, making thousands of movietitles available for this, with hundreds in High Definition (HD).
I personally have a Tivo, connected wirelessly to a regular non-HD television, as well as my PC, Mac and internet hub, and this allows me to view my photos, listen to my iTunes collection of music and internet radio stations from [Live365], as well as rent movies and TV shows from Amazon Unbox, with prices ranging from free to four dollars.
The theme of this week was "Something is in the Air", an obvious reference to this product, billed as the world's thinnest laptop.John Windsor on his YouBlog writes[Making it Memorable] aboutthe use of a standard office envelope to demonstrate how thin this new MacBook Air laptop is. It is 0.16 inchesat one end, and 0.76 inches as the other end. Unlike other "ultra-thin" laptops, this has a full-size back-lit keyboardand full-size 13.3 inch widescreen. The touchpad supports multi-touch gestures similar to the iPhone and iPod Touch.Intel managed to shrink down their Core 2 Duo processor chip by 60 percent to fit inside this machine. Thebattery is reported to last five hours.
This laptop was designed for wireless access, with 802.1n and BlueTooth enabled. No RJ-45 connection for traditionalLAN ethernet connection, but I guess you can use a USB-to-RJ45 converter.
Storage-wise, you can choose between the 1.8-inch 80GB HDD or a pricey-but-faster 64GB Flash Solid-State Disk (SSD).In a move similar to [getting rid of the 3.5-inch floppy disk in 1998's iMac G3], the MacBook Air got rid of the CD/DVDdrive. While they offer a USB-attachable SuperDrive as an optional peripheral, Steve Jobs gave alternative methods:
Watching movies on DVD
Rent or Buy from iTunes instead
Burning music CDs for your car stereo
Attach your iPod to your car stereo
Taking backups to CD or DVD
Use Time Machine and Time Capsule instead
Installing Software from CD
Wirelessly connect to a "Remote Optical Disc" on a Mac or PC, running special Apple-provided software that allows you to make this connection
Here's a list to the 90-minute[keynote address video]. If you arenot a fan of recycling, saving the environment, free speech or democracy, you can safely skip the last 15 minutes when musical artist Randy Newman performs.For alternative viewpoints on the keynote, see posts from [John Gruber] and [Tara MacKay].
In addition to creating the Dilbert cartoon, Scott Adams has a blog, which sometimes is quite serious,and other times quite funny. The anticipated 30x cost of "Flash Drives" for Enterprise disk systems reminded meof one of Scott's articles from November 2007 titled [Urge to Simplify].Here's an excerpt:
Now the casinos have people trained, like chickens hoping for pellets, to take money from one machine (the ATM), carry it across a room and deposit in another machine (the slot machine). I believe B.F. Skinner would agree with me that there is room for even more efficiency: The ATM and the slot machine need to be the same machine.
The casinos lose a lot of money waiting for the portly gamblers with respiratory issues to waddle from the ATM to the slot machines. A better solution would be for the losers, euphemistically called “players,” to stand at the ATM and watch their funds be transferred to the hotel, while hoping to somehow “win.” The ATM could be redesigned to blink and make exciting sounds, so it seems less like robbery.
I’m sure this is in the five-year plan. Longer term, people will be trained to set up automatic transfers from their banks to the casinos. People will just fly to Vegas, wander around on the tarmac while the casino drains their bank accounts, then board the plane and fly home. The airlines are already in on this concept, and stopped feeding you sandwiches a while ago.
Perhaps EMC can redesign its DMX-4 to "blink and make exciting sounds" as well. The Flash Drives were designedfor the financial services industry, so those disk systems could be directly connected to make transfers between the appropriate bank accounts.
When times are tough, people revert back to their "default programming", and companies search for their"core strengths".The Redwoods Group calls this the[Native Language Theory]. Here'san excerpt:
A young carpenter immigrates to the United States from Italy, unable to speak a word of English. Upon arrival, he moves into a small apartment by himself and begins looking for a job in construction. With some luck and a lot of hard work, he quickly lands a job at a local construction site. Over the coming weeks he learns how to say “hello” and “goodbye” to his English-only coworkers. As time goes on, he is able to learn more complex phrases and commands and is now able to begin taking on jobs that better match his level of expertise.
Several years after the carpenter moved to the US, he now speaks fluent English and has started a family with an American woman and now speaks only English on the job site and at home. One afternoon, while hammering at the framing of a new home, the carpenter strikes his thumb. In what language does he curse?Italian, of course.
We believe that this story illustrates the nature of reacting to difficult, stressful, and, yes, painful situations by reverting to what you know best. This is the reason that coaches ask their players to make certain actions “instinctual” – simply, when times get tough, we do what we fall back on our native language.
Last September, in my post[Supermarketsand Specialty Shops] I mentioned how Forrester Research identified two kinds of IT vendors selling storage. On one side were the"information infrastructure" companies (IBM, HP, Sun, and Dell) that focus on providing one-stop shopping for clients that want all parts of an IT solution, including servers, storage, software and services. These I compared to "supermarkets".
On the other side were the storage component vendors (EMC, HDS, NetApp, and many others) that focus on specificstorage components. These I compared to "specialty shops", like butchers, bakers and candlestick makers.These often appeal to customers with big enough IT staffs with the skills to do their own system integration.The key difference seems to be that the supermarkets are client-focused, and the specialty shops are technology-focused, and different people prefer to do business with one side or another.This came in handy last November to explain Dell's acquisition of EqualLogic and discuss[IBMEntry-Level iSCSI offerings].
Some recent news seems to fit this model, in relation to the Native Language Theory.
Several argued that EMC was in the process of shifting sides, from disk specialty shop over to an everything-but-servers supermarket. Certainly many of its acquisitions in software, services, and VMwarewould support the notion that perhaps they are going through an identity crisis.The immediate beneficiary was HDS, the #2 disk specialty shop, that passedup EMC with innovative features in its USP-V disk system.
However, times are tough, especially in the U.S. economy that many storage vendors are focused on. EMCappears to have found its native language, going back to its roots of solid state storage systems thatthey started with back in 1979. This week EMC announced [Symmetrix DMX-4 support of Flash drives].Several bloggers review the technology involved:
Overall smart move for EMC to go back to its technology-focused disk specialty shop mode and go head-to-head against the HDS threat. With Web 2.0 workloads moving off these monolithic solutions and onto [clustered storage more appropriate for "cloud computing"], large enterprise-class disk systems like theIBM System Storage DS8000 and EMC DMX-4 can shift focus on what they do best: online transaction processing (OLTP) and large databases. However,I noticed the EMC press release mentions EMC as an "information infrastructure" company, so perhaps they stillhaven't resolved their identity crisis.
After Sun acquired StorageTek specialty shop, they too had a bit of an identity crisis.Fortunately, they realized their core strengths were on the "supermarket" side,moved storage in with servers in their latest restructuring, changed their NYSE symbol from SUNW to JAVA, and reset their focus on providing end-to-end solutions like IBM. For example, fellow blogger Taylor Allis from Sun mentions their latest in "clustered storage" in his post[IBM Buys XIV - Good Move].
In an ironic twist, some of today's leading manufacturers of server computers are also among the companies moving most aggressively to reduce their need for servers and other hardware components. Hewlett-Packard, for instance, is in the midst of a project to slash the number of data centers it operates from 85 to 6 and to cut the number of servers it uses by 30 percent. Now, Sun Microsystems is upping the stakes. Brian Cinque, the data center architect in Sun's IT department, says the company's goal is to close down all its internal data centers by 2015. "Did I just say 0 data centers?" he writes on his blog."Yes! Our goal is to reduce our entire data center presence by 2015."
While Nick feels this is ironic for Sun, known for UNIX servers based on their SPARC chip technology, I don't. Sun has shifted from being technology-focused to being client-focused.This is where the marketplace is going, and the supermarket vendors, being client-focused, are best positioned to adapt to this new world. In a sense, Sun found its roots. Nick summarizes this as:"The network, to spin the old Sun slogan, becomes the data center."
So, each move seems to strengthen their respective identities back to their origins, or at least help them communicate that to the market.
Christopher Carfi on his Social Customer Manifesto blog has a great post[Let's Look at the Big Picture]that talks about Information as the new form of "money" by looking at how the concept of "money" wasfirst formed 150 years ago. Here's an excerpt:
Lesson 1: "Money" was very fragmented for a very long period of time after the colonization of North America
"Money" as we think of it in the form of cash/paper currency has only been around for about 150 years. Over a period of almost two hundred years both before and after that time, a number of fragmented methods were used to exchange value.
Lesson 2: Everybody needs to win
After the ideas of "cash" and "checks" had taken hold and become widespread, there were still many inefficiencies in the system. Cash is cumbersome, and subject to loss. Checks may bounce. This continued until the mid-1900's.
Enter the credit card.*
The credit card resonated with both customers and vendors because both parties received benefits.
Now, the widespread usage of credit cards was not something the occurred overnight. Instead, it was something that occurred over a generation. In 1970, only 16% of American households had credit cards. However, by 1995, that number had climbed to 65%.
We are now looking at Information in much the same way. It is fragmented, it is used to represent value, it is hoarded by some, shared by others. In much that "brown" is the new "black", does that mean "information" is the new"money"?
A related blog post from Shawn over at Anecdote discusses a panelist discussion of Albert Camus' work,The Stranger. Here is an excerpt:
... meaning is not pre-inscribed in the world around us and we are continuously seeking meaning in an inherently meaningless world. I almost toppled off the step machine. Do we live in an inherently meaningless world? On first thought I think the answer is yes. The onus is on us to make sense of our world.
And here is where information, by itself, is not of value unless people place value on it. Just as people valued Wampum and Furs, and could therefore trade it for other goods, people trade information for other itemsof value. But the onus is on us to make sense of the information, to determine the meaning of it, and use thisto help drive business or other accomplishments.
Are you leveraging information as well as investors leverage other people's money? If not, IBM can help.
Rather than a target weight, I chose a target waist measurement, but did not quite make this one. I did keep up with my weekly exercise regime, but we recently installed an "ice cream freezer" here at work, and I have failed to resist temptation.
Reduce, Reuse and Recycle
In my post [Stayingon Budget], I resolved to "reduce, reuse and recycle". I have taken measures to de-clutter and simplify mylife, and already things are paying off. So I am happy about this one.
Learn to Better use Lotus Notes and Office 2007 software
In my post [Honeyour Tools and Skills], I resolved to learn how to better use Lotus Notes and Office 2007. We never got Office 2007.In a surprise move, IBM put out Lotus Symphony, an Office 2007 replacement. Lotus Symphony works on IBM's three approved recognized desktop platforms (Windows XP, Linux and Mac OS X). Here's a collection of [IBM Press Releases about Lotus Symphony].
I did learn how to better use Lotus Notes,thanks to Alan Lepofsky's blog [IBM Lotus Notes Hints, Tips, and Tricks].Ironically, the best help for dealing with Lotus Notes was not the software itself, but the skills in handling emailin general. This includes:
Resist the urge to copy the world, and better use "bcc" to be kind to upper management on "reply all" respondents.
Avoid attaching large documents, but use URL's to NAS file shares, websites, or [YouSendIt.com] instead. Obviously, the recipient has to have access to whatever you point to, but it greatly reduces total email volume and improves transmission over wireless.
Delegate. A lot of times I was the "middleman" between someone asking a question, and someone else Iknew had the answer. Now, I just introduce them together and step out of the way.
Checking email only a few times a day. I use to check my email every 5-10 minutes, now only 2-4 times per day.
In my post, [Lighten Up], I resolved to laugh more, stretch more, get enough sleep, and listen to music more. I participated in monthly[Tucson Laughter Club]events, incorporated stretching in my weekly exercise program, have gotten more sleep, and rediscovered some of my older music that I hadn't listened to in a while. Overall, I feel happy I met this one.
My New Year's Resolutions for 2008:
Improve my writing skills
Going back through my past blog postings, some of my sentences and paragraphs were frightful. I resolve toimprove my sentence and paragraph structure, and make better use of HTML tags to improve the layout andformatting.
Improve my HTML and Web design skills
Contribute to the OLPC Foundation
Last year, as a "Day 1 Donor", I had donated to this important charitable organization to help educate the childrenof third world nations. This year, I plan to learn Python and other programming languages used on the XO laptop,and see how I can contribute my skills and expertise on the OLPC forums.
Eat Healthier and Drink more
I think my downfall with last year's resolution was that it was merely a goal, 35 inch waist, rather thana "call for action". This year, I plan to eat more fish, salads, whole grains and other heart-healthy foods.
While many people resolve to "Quit Drinking", I need to drink more. My doctor, my personaltrainer, and even my interpreter teams, have asked me to do so. We live in Tucson, Arizona, during a centuryof global warming, and dehydration can cause stress on the body.
Attend more movies and film-making events
Last year, I joined the Tucson Film Society, and produced[my first film], part of which was filmedfrom Bogota, Colombia. I got invited to see a lot of independent films, premieres, and film-maker events, but did not attend many. I resolve to attend more in 2008.
Get better Organized
Moving offices from one building to another brought to light that I wasn't well organized. While I havemade some efforts to de-clutter my home, I need to step this up to my work as well.
I decided to start with something very non-tech, a [Hipster PDA]. I have nowmet or heard several people who use this approach successfully, and have decided to give it a try.
Hopefully, this list might inspire you to come up with your own resolutions. Not surprisingly, writing them in a public forum helped me keep most of them, and stick to my resolutions throughout the year.
Whew! I am glad that is over. The BarryB circus has left town, he has decided to [move on to other topics], and I am now to clean up the ["circus gold"] leftbehind. I would like to remind everyone that all of these discussions have been about the architecture,not the product. IBM will come out withits own version of a product based on Nextra later in 2008, which may be different than the product that XIV currentlysells to its customers.
RAID-X does not protect against double-drive failures as well as RAID-6, but it's very close
BarryB calls this the "Elephant in the room", that RAID-6 protects better against double-drive failures. I don't dispute that. He also credits me with the term "RAID-X", but I got this directly from the XIV guys. It turns out this was already a term used among academic research circles for [distributed RAID environments]. Meanwhile, Jon Toigo feels the term RAID-X sounds like a brand of bug spray in his post[XIV Architecture: What’s Not to Like?]Perhaps IBM can change this to RAID-5.99 instead.
If you measure risk of a second drive failing during the rebuild or re-replication process ofa first drive failure, you can measure the exposure by multiplying the amount of GB at risk by thenumber of hours that the second failure could occur, resulting in a unit of "GB-hours". Here Ilist best-case rebuild times, your mileage may vary depending on whether other workloads existon the system competing for resources. Notice that 8-disk configurations of RAID-10 and RAID-5for smaller FC disk are in the triple digits, and larger SATA disk in five digits, but that with RAID-X it is only single digits. That is orders of magnitude closer to the ideal.
For each RAID type, the risk is proportional to the square of the individual drive size.Double the drive size causes the risk to be four times greater.This is not the first time this has been discussed. In [Is RAID-5 Getting Old?], Ramskovquotes NetApp's response in Robin Harris' [NetApp Weighs In On Disks]:
...protecting online data only via RAID 5 today verges on professional malpractice.
As disks get older, RAID-6 will not be able to protect against 3-drive failures. A similar chartabove could show the risk to data after the second drive fails and both rebuilds are going on,compared to the risk of a third drive failure during this time. The RAID-X scheme protects muchbetter against 3-drive failures than RAID-6.
Nothing in the Nextra architecture prevents a RAID-6, Triple-copy, or other blob-level scheme
In much the same way that EMC Centera is RAID-5 based for its blobs, there is nothing in the Nextra architecturethat prevents taking additional steps to provide even better protection, using a RAID-6 scheme, making three copiesof the data instead of two copies, or something even more advanced. The current two-copy scheme for RAID-X is betterthan all the RAID-5 and RAID-10 systems out in the marketplace today.
Mirrored Cache won't protect against Cosmic rays, but ECC detection/correction does
BarryB incorrectly states that since some implementations of cache are non-mirrored, that this implies they are unprotected against Cosmic rays. Mirroring does not protect against bit-flips unless both copies arecompared for differences. Unfortunately, even if you compared them, the best you can do is detect theyare different, there is no way of knowing which version is correct.Mirroring cache is normally done to protect uncommitted writes. Reads in cacheare expendable copies of data already written to disk, so ECC detection/correction schemes are adequateprotection. ECC is like RAID for DRAM memory. A single bit-flip can be corrected, multiple bit-flipscan be detected. In the case of detection, the cache copy is discarded and read fresh again from disk.IBM DS8000, XIV and probably most other major vendor offerings use ECC of some kind. BarryB is correctthat some cheaper entry-level and midrange offerings from other vendors might cut corners in this area.I don't doubt BarryB's assertion that the ECC method used in the EMC products may be differently implemented than theECC in the IBM DS8000, but that doesn't mean the IBM DS8000's ECC implementation is flawed.
ECC protection is important for all RAID systems that perform rebuild, and even more importantthe larger the GB-hours listed in the table above.
XIV is designed for high-utilization, not less than 50 percent
I mentioned that the typical Linux, UNIX or Windows LUN is only 30-50 percent full, and perhaps BarryBthought I was referring to the typical "XIV customer". This average is for all disk storage systems connectedto these operating systems, based on IBM market research and analyst reports. The XIV is expected to run at much higher utilization rates, and offers features like "thin provisioning" and "differential snapshot" to make this simple to implement in practice.
Most often, disks don't fail without warning. Usually, they give out temporary errors first, and then fail permanently.The XIV architecture allows for pre-emptive self-repair, initiating the re-replication process after detecting temporary errors, rather than waiting for a complete drive failure.
I had mentioned that this process used "spare capacity, not spare drives" but I was notified that there are three spare drives per system to ensure that there is enough spare capacity, so I stand corrected.
New drives don't have to match the same speed/capacity as the new drives, so three to five years from now, whenit might be hard to find a matching 500GB SATA drive anymore, you won't have to.
No RAID scheme eliminates backups or Business Continuity Planning
The XIV supports both synchronous and asynchronous disk mirroring to remote locations. Backup software willbe able to backup data from the XIV to tape. A double drive failure would require a "recovery action", eitherfrom the disk mirror, or from tape, for the few GB of data that need to be recovered.
A third alternative is to allow end-users to receive backups of their own user-generated content. For example, I have over 15,000 photos uploaded over the past six years to Kodak Photo Gallery, which I use to share with my friends and family. For about $180 US dollars, they will cut DVDs containing all of my uploaded files and send them to me, so that I do not have to worry about Kodak losing my photos.In many cases, if a company or product fails to deliver on its promises, the most you will get is your money back, but for "free services" like HotMail, FreeDrive, FlickR and others, you didn't pay anything in the first place, andthey may point this limitation of liability in the "terms of service".
XIV can be used for databases and other online transaction processing
The XIV will have FCP and iSCSI interfaces, and systems can use these to store any kind of data you want. I mentionedthat the design was intended for large volumes of unstructured digital content, but there is nothing to prevent the use of other workloads. In today's Wall Street Journal article[To Get Back Into the Storage Game, IBM Calls In an Old Foe]:
Today, XIV's Nextra system is used by Bank Leumi, a large Israeli bank, and a few other customers for traditional data-storage tasks such as recording hundreds of transactions a minute.
BarryB, thanks for calling the truce. I look forward to talking about other topics myself. These past two weeks have been exhausting!
In my post yesterday [Spreading out the Re-Replication process], fellow blogger BarryB [aka The Storage Anarchist]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
JBOD: 4 drives
JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
RAID-10: 8 drives
RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
RAID-5: 5 drives
This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
RAID-6: 6 drives
This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
On his The Storage Architect blog, Chris Evans wrote [Twofor the Price of One]. He asks: why use RAID-1 compared to say a 14+2 RAID-6 configuration which would be much cheaper in terms of the disk cost? Perhpaps without realizing it, answers itwith his post today [XIV part II]:
So, as a drive fails, all drives could be copying to all drives in an attempt to ensure the recreated lost mirrors are well distributed across the subsystem. If this is true, all drives would become busy for read/writes for the rebuild time, rather than rebuild overhead being isolated to just one RAID group.
Let me try to explain. (Note: This is an oversimplification of the actual algorithm in an effortto make it more accessible to most readers, based on written materials I have been provided as partof the acquisition.)
In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one driveto write, and today's systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.
In the IBM XIV Nextra architecture, let's say we have 100 drives. We lose drive 13, and we need to re-replicate any at-risk 1MB objects.An object is at-risk if it is the last and only remaining copy on the system. A 750GB that is 90 percent full wouldhave 700,000 or so at-risk object re-replications to manage. These can be sorted by drive. Drive 1 might have about 7000 objects that need re-replication, drive 2might have slightly more, slightly less, and so on, up to drive 100. The re-replication of objects on these other 99 drives goes through three waves.
Select 49 drives as "source volumes", and pair each randomly with a "destination volume". For example, drive 1 mapped todrive 87, drive 2 to drive 59, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
50 volumes left.Select another 49 drives as "source volumes", and pair each with a "destination volume". For example, drive 87 mapped todrive 15, drive 59 to drive 42, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
Only one drive left. We select the last volume as the source volume, pair it off with a random destination volume,and complete the process.
Each wave can take as little as 3-5 minutes. The actual algorithm is more complicated than this, as tasks complete early the source and volumes drives are available for re-assignment to another task, but you get the idea. XIV hasdemonstrated the entire process, identifying all at-risk objects, sorting them by drive location, randomly selectingdrive pairs, and then performing most of these tasks in parallel, can be done in 15-20 minutes. Over 40 customershave been using this architecture over the past 2 years, and by now all have probably experienced at least adrive failure to validate this methodology.
In the unlikely event that a second drive fails during this short time, only one of the 99 task fails. The other 98 tasks continue to helpprotect the data. By comparison, in a RAID-5 rebuild, no data is protected until all the blocks are copied.
As for requiring spare capacity on each drive to handle this case, the best disks in production environments aretypically only 85-90 percent full, leaving plenty of spare capacity to handle re-replication process. On average,Linux, UNIX and Windows systems tend to only fill disks 30 to 50 percent full, so the fear there is not enough sparecapacity should not be an issue.
The difference in cost between RAID-1 and RAID-5 becomes minimal as hardware gets cheaper and cheaper. For every $1 dollar you spend on storage hardware, you spend $5-$8 dollars managing the environment. As hardware gets cheaper still, it might even be worth making three copies of every 1MB object, the parallel processto perform re-replications would be the same. This could be done using policy-based management, some data gets triple-copied, and other data gets only double-copied, based on whether the user selected "premium" or "basic" service.
The beauty of this approach is that it works with 100 drives, 1000 drives, or even a million drives. Parallel processingis how supercomputers are able to perform feats of amazing mathematical computations so quickly, and how Web 2.0services like Google and Yahoo can perform web searches so quickly. Spreading the re-replication process acrossmany drives in parallel, rather than performing them serially onto a single drive, is just one of the many uniquefeatures of this new architecture.
Wrapping up my week's theme on IBM's acquisition XIV, we have gotten hundreds of positive articles and reviews in the press, but has caused quite a stir with the[Not-Invented-Here] folks at EMC.We've heard already from EMC bloggers [Chuck Hollis] and [Mark Twomey].The latest is fellow EMC blogger BarryB's missive [Obligatory "IBM buys XIV" Post], which piles on the "Fear, Uncertainty and Doubt" [FUD], including this excerpt here:
In a block storage device, only the host file system or database engine "knows" what's actually stored in there. So in the Nextra case that Tony has described, if even only 7,500-15,000 of the 750,000 total 1MB blobs stored on a single 750GB drive (that's "only" 1 to 2%) suddenly become inaccessible because the drive that held the backup copy also failed, the impact on a file system could be devastating. That 1MB might be in the middle of a 13MB photograph (rendering the entire photo unusable). Or it might contain dozens of little files, now vanished without a trace. Or worst yet, it could actually contain the file system metadata, which describes the names and locations of all the rest of the files in the file system. Each 1MB lost to a double drive failure could mean the loss of an enormous percentage of the files in a file system.
And in fact, with Nextra, the impact will be across not just one, but more likely several dozens or even hundreds of file systems.
Worse still, the Nextra can't do anything to help recover the lost files.
Nothing could be further from the truth. If any disk drive module failed, the system would know exactly whichone it was, what blobs (binary large objects) were on it, and where the replicated copies of those blobs are located. In the event of a rare double-drive failure, the system would know exactly which unfortunate blobs were lost, and couldidentify them by host LUN and block address numbers, so that appropriate repair actions could be taken from remote mirrored copies or tape file backups.
Second, nobody is suggesting we are going to put a delicateFAT32-like Circa-1980 file system that breaks with the loss of a single block and requires tools like "fsck" to piece back together. Today's modern file systems--including Windows NTFS, Linux ext3, and AIX JFS2--are journaled and have sophisticated algorithms tohandle the loss of individual structure inode blocks. IBM has its own General Parallel File System [GPFS] and corresponding Scale out File Services[SOFS], and thus brings a lotof expertise to the table.Advanced distributed clustered file systems, like [Google File System] and Yahoo's [Hadoop project] take this one step further, recognizing that individual node and drive failures at the Petabyte-scale are inevitable.
In other words, XIV Nextra architecture is designed to eliminate or reduce recovery actions after disk failures, not make them worse. Back in 2003, when IBM introduced the new and innovative SAN Volume Controller (SVC), EMCclaimed this in-band architecture would slow down applications and "brain-damage" their EMC Symmetrix hardware.Reality has proved the opposite, SVC can improve application performance and help reduce wear-and-tear on the manageddevices. Since then, EMC acquired Kashya to offer its own in-band architecture in a product called EMC RecoverPoint, that offers some of the features that SVC offers.
If you thought fear mongering like this was unique to the IT industry, consider that 105years ago, [Edison electrocuted an elephant]. To understand this horrific event, you have to understand what was going on at the time.Thomas Edison, inventor of the light bulb, wanted to power the entire city of New York with Direct Current(DC). Nikolas Tesla proposed a different, but more appropriate architecture,called Alternating Current(AC), that had lower losses over distances required for a city as large and spread out as New York. But Thomas Edison was heavily invested in DC technology, and would lose out on royalties if ACwas adopted.In an effort to show that AC was too dangerous to have in homes and businesses, Thomas Edison held a pressconference in front of 1500 witnesses, electrocuting an elephant named Topsy with 6600 volts, and filmed the event so that it could be shown later to other audiences (Edison invented the movie camera also).
Today's nationwide electric grid would not exist without Alternating Current.We enjoy both AC for what it is best used for, and DC for what it is best used for. Both are dangerous at high voltage levels if not handled properly. The same is the case for storage architectures. Traditional high-performance disk arrays, like the IBM System Storage DS8000, will continue to be used for large mainframe applications, online transaction processing and databases. New architectures,like IBM XIV Nextra, will be used for new Web 2.0 applications, where scalability, self-tuning, self-repair,and management simplicity are the key requirements.
(Update: Dear readers, this was meant as a metaphor only, relating the concerns expressed above thatthe use of new innovative technology may result in the loss or corruption of "several dozen or even hundreds of file systems" and thus too dangerous to use, with an analogy on the use of AC electricity was too dangerous to use in homes. To clarify, EMC did not re-enact Thomas Edison's event, no animalswere hurt by EMC, and I was not trying to make political commentary about the current controversy of electrocution as amethod of capital punishment. The opinions of individual bloggers do not necessarily reflect the official positions of EMC, and I am not implying that anyone at EMC enjoys torturing animals of any size, or their positions on capital punishment in general. This is not an attack on any of the above-mentioned EMC bloggers, but rather to point out faulty logic. Children should not put foil gum wrappers in electrical sockets. BarryB and I have apologized to each other over these posts for any feelings hurt, and discussion should focus instead on the technologies and architectures.)
While EMC might try to tell people today that nobody needs unique storage architectures for Web 2.0 applications, digital media and archive data, because their existing products support SATA disk and can be used instead for these workloads, they are probably working hard behind the scenes on their own "me, too" version.And with a bit of irony, Edison's film of the elephant is available on YouTube, one of the many Web 2.0 websites we are talking about. (Out of a sense of decency, I decided not to link to it here, so don't ask)
Yesterday's announcement that IBM had acquired XIV to offer storage for Web 2.0 applicationsprompted a lot of discussion in both the media and the blogosphere. Several indicated thatit was about time that one of the major vendors stepped forward to provide this, and it madesense that IBM, the leader in storage hardware marketshare, would be the first. Others were perhaps confused on what is unique with Web 2.0 applications. What has changed?
I'll use this graphic to help explain how we have transitioned through three eras of storage.
The first era: Server-centric
In the 1950s, IBM introduced both tape and disk systems into a very server-centric environment.Dumb terminals and dumb storage devices were managed entirely by the brains inside the server.These machines were designed for Online Transaction Processing (OLTP), everywhere from bookingflights on airlines to handling financial transfers.
The second era: Network-centric
In the 1980s and 1990s, dumb terminals were replaced with smarter workstations and personalcomputers; and dumb storage were replaced with smarter storage controllers. Local Area Networks (LANs)and Storage Area Networks (SANs) allowed more cooperative processing between users, servers andstorage. However, servers maintained their role as gatekeepers. Users had to go through aspecific server or server cluster to access the storage they had access to. These servers continuedtheir role in OLTP, but also manage informational databases, file sharing and web serving.
The third era: Information-centric
Today, we are entering a third era. Servers are no longer the gatekeepers. Smart workstationsand personal computers are now supplemented with even more intelligent handheld devices, Blackberryand iPhones, for example. Storage is more intelligent too, with some being able to offer file sharingand web serving directly, without the need of an intervening server. The roles of servers have changed,from gatekeepers, to ones that focuses on crunching the numbers, and making information presentableand useful.
Here is where Web 2.0 applications, digital media and archives fits in. These are focused on unstructured data that don't require relational database management systems. So long as the useris authorized, subscribed and/or has made the appropriate payment, she can access the information. With the appropriate schemes in place, information can now be mashed-up in a variety of ways, combined with other information that can render insights and help drive new innovations.
Of course, we will still have databases and online transaction processing to book our flights andtransfer our funds, but this new era brings in new requirements for information storage, and newarchitectures that help optimize this new approach.
Well, it's 2008, which could mark the end to RAID5 and mark the beginnings of a new disk storagearchitecture. IBM starts the year with exciting news, acquiring new disk technology from a smallstart-up called XIV, led by former-EMCer Moshe Yanai. Moshe was ousted publicly in 2001 from hisposition as EMC's VP of engineering, and formed his own company. It didn't take long for EMC bloggersto poke fun at this already. Mark Twomey, in his StorageZilla blog, had mentioned XIV before back in August,[XIV], and again todayin [IBM Buys XIV].
To address the new requirements associated with next generation digital content, IBM chose XIV and its NEXTRA™ architecture for its ability to scale dynamically, heal itself in the event of failure, and self-tune for optimum performance, all while eliminating the significant management burden typically associated with rapid growth environments. The architecture also is designed to automatically optimize resource utilization of all the components within the system, which can allow for easier management and configuration and improved performance and data availability.
"We are pleased to become a significant part of the IBM family, allowing for our unique storage architecture, our engineers and our storage industry experience to be part of IBM's overall storage business," said Moshe Yanai, chairman, XIV. "We believe the level of technological innovation achieved by our development team is unparalleled in the storage industry. Combining our storage architectural advancements with IBM's world-wide research, sales, service, manufacturing, and distribution capabilities will provide us with the ability to have these technologies tackle the emerging Web 2.0 technology needs and reach every corner of the world."
The NEXTRA architecture has been in production for more than two years, with more than four petabytes of capacity being used by customers today.
Current disk arrays were designed for online transaction processing (OLTP) databases. The focus was onusing fastest most expensive 10K and 15K RPM Fibre Channel drives, with clever caching algorithmsfor quick small updates of large relational databases. However, the world is changing, and peoplenow are looking for storage designed for digital media, archives, and other Web 2.0 applications.
One problem that NEXTRA architecture addresses is RAID rebuild. In a standard RAID5 6+P+S configuration of 146GB 10K RPM drives, the loss of one disk drive module (DDM) was recovered by reconstructing the data from parity of the other drives onto the spare drive. The process took46 minutes or longer, depending on how busy the system was doing other things. During this time,if a second drive in the same rank fails, all 876GB of data are lost. Double-drive failures are rare,but unpleasant when they happen, and hopefully you have a backup on tape to recover the data from.Moving to slower, less expensive SATA drives made this situation worse. The drives have highercapacity, but run at slower speeds. When a SATA drive fails in a RAID5 array, it could take severalhours to rebuild, and that is more time exposure for a second drive failure. A rebuild for a 750GBSATA drive would take five hours or more,with 4.5 TB of data at risk during the process if a second drive failure occurs.
The Nextra architecture doesn't use traditional RAID ranks or spare DDMs. Instead, data is carved up into 1MBobjects, and each object is stored on two physically-separate drives. In the event of a DDM loss, allthe data is readable from the second copies that are spread across hundreds of drives. New copies aremade on the empty disk space of the remaining system. This process can be done for a lost 750GB drive in under20 minutes. A double-drive failure would only lose those few objects that were on both drives, so perhaps1 to 2 percent of the total data stored on that logical volume.
Losing 1 to 2 percent of data might be devastating to a large relational database, as this could impactthe entire access to the internal structure. However, this box was designed for unstructuredcontent, like medical images, music, videos, Web pages, and other discrete files. In the event of a double-drivefailure, individual files would be recovered, such as with IBM Tivoli Storage Manager backup software.
IBM will continue to offer high-speed disk arrays like the IBM System Storage DS8000 and DS4800 for OLTP applications, and offer NEXTRA for this new surge in digital content of unstructured data. Recognizing this trend, diskdrive module manufacturers will phase out 10K RPM drives, and focus on 15K RPM for OLTP, and low-speedSATA for everything else.
Update: This blog post was focused on the version of XIV box available as of January 2008 that was built by XIV prior to the IBM acquisition. IBM has since made a major revision, made available August 2008 thataddresses a variety of workloads, including database, OLTP, email, as well as digital content and unstructuredfiles. Contact your IBM or IBM Business Partner for the latest details!
Bottom line, IBM continues to celebrate the new year, while the EMC folks in Hopkington, MA will continue to nurse their hangovers. Now that's a good way to start the new year!
Well, it's the last day of the year, and I will be celebrating the new year soon.In the mean time, I leave you with an interesting triple combo related to information.
Nick Carr in his post [Cleaning the Slate] offers a list of articles he did not have time for in 2007.Of these, I enjoyed the 7-page keynote address[Information, Knowledge, Authority and Democracy] by Hunter R. Rawlings III.He talks about the importance of recorded knowledge, including discussions by the US founding fathers Thomas Jefferson and James Madison, and how information is an essential part of democracy.Here's a brief excerpt:
Following the burning of the Capitol in 1815,President James Madison restored the Library of Congress by purchasing ThomasJefferson’s library for the nation. It was Jefferson’s unique classification scheme that thefirst full-time Librarian of Congress, appointed by Madison, used in reorganizing theLibrary. The United States, embodied in the Congress, was to have the best library inthe world because knowledge was necessary to its fundamental purpose, the creationand protection of liberty.
James Madison believed, in other words, that he lived in a “knowledge age.” In ourmyopic way, we like to think that we invented the knowledge age sometime late in the20th century. We did not. Madison and his contemporaries had complete faith andconfidence in the necessity of what they called “useful knowledge,” which, of course,privileged many things we no longer consider useful, such as the ability to read Latinand Greek and to understand the lessons of ancient history.
...by employing collaborative filtering, you use other people’s time to weed out the things that would waste yours. In fact, Del.icio.us and Stumble Upon polls your friends and people with similar interests for the most crucial sources of information and anything else you might have accident skipped over. If The Wisdom of Crowds has taught us anything, it is that a large group of people is drastically more efficient than you’ll ever be on your own.
Unless you enjoy grinding yourself to the bone, use this principle—whether you call it “crowdsourcing” or otherwise—to stop drinking from the information fire hose. It’s not more information, it’s better information, that distinguishes the real winners in business and life.
Finally, Galacticast presents [A Copyright Carol],a humorous 5-minute parody video on what might happen in the future as a result of lawslike the Canadian Digital Millennium Copyright Act[DMCA].
Yesterday, I was able to get the "Build 650" up and running under Qemu emulation onmy Thinkpad laptop computer. Today, I was able to get my Thinkpad and my XO laptoptalking to each other for a "chat".
The built-in "Chat" activity is one of the many kid-friendly activities included onthe XO laptop for the One Laptop Per Child [OLPC] project.It is also possible for two or more people to share other activities, like editing a textdocument, or browsing the internet.
As they say, emulation is only 95% complete, and this is true in this case as well. My Thinkpaddoes not have a built-in video camera, and for some reason the Qemu emulation does not let mehear any sound, despite specifying "-soundhw es1370" parameter. And lastly, it doesn't have the"mesh network" built-in Wi-Fi capability, just standard 54Mbps 802.1g through my Linksys router.
So, I set both XO and Thinkpad to use the new "xochat.org" jabber server so that the two couldsee each other:
$ sugar_control_panel -s jabber xochat.org
I set my XO nickname to be "TonyP" and my Thinkpad to be "Pearson", and chose blue-orange forthe first, and orange-blue for the second.
The process of starting a chat is similar to other IM systems like IBM Lotus Sametime. You havea neighborhood view that shows all people online using the same jabber server. In my case therewere about 30 or so icons on the screen. From the colors on my XO, I was able to locate my Thinkpad,and invite him to a chat. You can share the chat with everyone on the network, or keep it privatebetween two people. I tried both ways to see the difference.
In a private two-way chat, the first person starts up their Chat activity, and sends an inviteto join to another person. The second person sees a flashing chat bubble on the bottom of thescreen to the left of all the other action bar icons. The difference is that the chat bubble isblue-orange matching the sender, rather than black-and-white of the rest of the icons.
If the recipient happens to be busy doing something else full-screen, like browsing the web, theredoesn't seem to be any interruption. It is only when he goes to "home view" will he see the coloredchat bubble and decide to join or not.
The chat itself colorizes the text to match to color of the participant's icons. Blue for one, and orangefor the other. It two people had identical color schemes I guess it might be hard to tell. Thetext is white, so it is best to choose darker colors for contrast.
A nice feature is that you can save your chat session with the "keep" button on the upper rightpart of the screen, and your dialogue discussion will show up as an entry in the "journal".
Using this technique, it is possible for someone who has one "XO" laptop and one regular computer,or two regular computers, to develop and test applications that involve the sharing aspect of educational opportunities. Chats can be between students, student-to-teacher, or event student-to-mentor.
Continuing my week's theme on the XO laptop from the One Laptop Per Child [OLPC] foundation, I successfully managedto emulate my XO on another system.
Part of what is attractive of the XO laptop is the hardware, the high-resolution200dpi screen, the clever screen that rotates and folds flat into an eBook reader,and the water-tight, dust-proof keyboard. The other part is the software, howthey managed to pack an entire operating system, with useful applications, intoa 1GB NAND flash drive.
The drawback for developers like me is the risk of changing something that breaks the system. For example, my first attempt to create my own activityresulted in a blank space in my action bar, and my journal went into someinfinite loop, blinking as if it were still loading for minutes on end. I fixed it by deleting out the activity I created and rebooting.
To get around this, I successfully ran the disk-image under Linux's Virtual Machinesoftware called Qemu. This is an open source offering, with a proprietary add-onaccelerator called Kqemu. Here were the steps involved:
Base Operating System
Qemu is now available to run on Linux, Windows and OS X-Intel. I have an Ubuntu 7.04"Feisty Faun" version of Linux installed on my system from a project I did last year, so decided to use that.
Normally, "apt-get install qemu" would be enough, but I wanted to get the latest release, so I downloaded the [0.9.0 version]tarball of compiled binaries. Note that trying to compile Qemu from source requiresa downlevel gcc-3.x compiler, and my attempts to do this failed. The compiled binariesworked fine.
The Kqemu author hasn't packaged this for distribution, so I download the source code anddid my own compiles. You can do the "configure-make-install" using the regular gcc 4.1compiler and it went smoothly.
Getting Kqemu active was bit of a challenge. I had to make sense of Nando Florestan's[Installing Kqumu in Ubuntu] article,and the subsequent comments that followed.
There is a tiny [8MB Linux image]that should be used to verify the Kqemu is activated correctly.
The Disk Image
As with other development efforts, there are the older stable versions, and the bleedingedge development versions. I chose the 650 Build from the [Ship.2 stable versions], whichmatches the version on my XO laptop. The image comes as a *.bz2, which is a highly-compressedfile. Using "Bunzip2", the 221MB file expands to something like 932MB.
I renamed the resulting file to "build650.img"
Once I got all this done, I then made a simple script "launch" in my /home/tpearson/bin directory:
Then "launch build650.img" was all I needed to run the emulation. The full-screen mode helpsemulate the view on XO laptop. I was able to change the jabber server to "xochat.org" and see otherXO laptops online on my neighborhood view.
When running under Qemu, you can't just press Ctrl-Alt-something. For example, Ctrl-Alt-Erase onthe XO reboots the Sugar interface. However, do this on a Linux system, and it reboots your nativeX interface, blowing away everything.Instead, you press Ctrl-Alt-2 to get to the Qemu console, designated by (qemu) prompt,and then type:
Press "Ctrl-Alt-1" followed by "Ctrl-Alt" to get back to the emulated XO screen.
With this emulation, I am more likely to try new things, change files around, edit system files,and so on, without worrying about rendering my actual XO laptop unusable. Once debugged, I canthen work on moving them over to my XO, one at a time.
Wrapping up this week's theme on the XO laptop, I decided to take on thechallenge of printing. I managed to print from my XO laptop to my laserjet printer.I checked the One Laptop Per Child [OLPC] website,and found there is no built-in support for printers, but there have been several peopleasking how to print from the XO, so here are the steps I did to make it happen.
(Note: I did all of these steps successfully on my Qemu-emulated system first, and then performed them on my XO laptop)
Step 1: Determine if you have an acceptable printer
The XO laptop can only connect to a printer via USB cable or over the network.Check your printer to see if it supports either of these two options. In my case, my printer is connected to my Linksys hub that offers Wi-Fi in my home.
The XO runs a modified version of Red Hat's Fedora 7, so we need to also determineif the printer is supported on Linux.Check the [Open Printing Database]for the level of support. This database has come up with the following ranking system.Printers are categorized according to how well they work under Linux and Unix. The ratings do not pertain to whether or not the printer will be auto-recognized or auto-configured, but merely to the highest level of functionality achieved.
Perfectly - everything the printer can do is working also under Linux
Mostly - work almost perfectly - funny enhanced resolution modes may be missing, or the color is a bit off, but nothing that would make the printouts not useful
Partially - mostly don't work; you may be able to print only in black and white on a color printer, or the printouts look horrible
Paperweight - These printers don't work at all. They may work in the future, but don't count on it
If your printer only supports a parallel cable connection, or does not have a high enough ranking above, go buy another printer. The [Linux Foundation] websiteoffers a list of suggested printers and tutorials.
In my case, I have a Brother HL5250-DN black-and-white laserjet printer connected over a network to Windows XP, OS X and my other Linux systems. It is rated as supporting Linux perfectly, so I decided to use this for my XO laptop.
Step 2: Install Common UNIX Printing System (CUPS)
Technically, Linux is not UNIX, but for our purposes, close enough. Start the Terminalactivity, use "su" to change to root, and then use "yum" to install CUPS. Yum will automatically determine what other packages are needed, in this case paps and tmpwatch. Once installed, use "/usr/sbin/cupsd" to get the CUPS daemon started, and add this to the end ofrc.local so that it gets started every time you reboot.
Click graphic on the left to see larger view
[olpc@xo-10-CC-6F ~]$ subash-3.2# yum install cups...Total download size = 3.0 MIs this OK [y/N]? y
To download the appropriate drivers, you may need a browser that can handle file downloads. I have triedto do this with the built-in Browse activity (aka Gecko) but encountered problems. I have both Opera and Firefox installed, but I will focus on Opera for this effort.I also installed the older220.127.116.11 version of the Flash player (worked better than the latest 18.104.22.168 version) and Java JRE.Follow the OLPC Wiki instructions for [Opera, Adobe Flash,and Sun Java] installation, thenverify with the following [Java and Flash] testers.
Step 4: Download drivers and packages unique for your printer
In my case, I used Opera to get to the [Brother Linux Driver Homepage], and downloaded the RPM's for LPR and CUPS wrapper. These are the ones listed under "Drivers for Red Hat, Mandrake (Mandriva), SuSE". I saved these under "/home/olpc" directory.
By default, the root user has no password. However, you will need it to be something for later steps,so here is the process to create a root password. I set mine to "tony" which normallywould be considered too simple a password, but ignore those messages and continue.We will remove it in step 8 (below) to put things back to normal.
[olpc@xo-10-CC-6F ~]$ subash-3.2# passwdChanging password for user root.New UNIX password: tonyBAD PASSWORD: it is too shortRetype new UNIX password: tonypasswd: all authentication tokens updated successfullybash-3.2# exit[olpc@xo-10-CC-6F ~]$
Step 6: Launch CUPS administration
Here I followed the instructions in Robert Spotswood's [Printing In Linux with CUPS] tutorial.Launch the Opera browser, and enter "http://localhost:631/admin" as the URL. The localhostrefers to the laptop itself, and 631 is the special port that CUPS listens to from browsers. You can alsouse 127.0.0.1 as a shortcut for "localhost", and can be used interchangeably.
In my case, it detected both of my networked printers, so I selected the HL5250DN, entered thelocation of my PPD file "/usr/share/cups/model/HL5250DN.ppd" that was created in Step 4. I set the URI to "lpd://192.168.0.75/binary_p1" per the instructions [Network Setting in CUPS based Linux system] in the Brother FAQ page. I chage the page size from "A4" to "Letter".I set this printer as the default printer. When it asks for userid and password, that is whereyou would enter "root" for the user, and "tony" or whatever you decided to set your root password to.
Select "Print a Test Page" to verify that everything is working.
Step 7: Printing actual files
Sadly, I don't know Opera well enough to know how to print from there. So, I went over to my trustedFirefox browser. Select File->Page Setup to specify the settings, File->Print Preview tosee what it will look like, and then File->Print to send it to the printer.
To print the file "out.txt" that is in your /home/olpc directory, for example, enter"file:///home/olpc/out.txt" as the URL of the firefox browser. This will show the file,which you can then print to your printer. I had to specify 200% scaling otherwise the fontswere too small to read.
Step 8: Remove the "root" password
If you want to remove the root password, here are the steps.
[olpc@xo-10-CC-6F ~]$ suPassword: tonybash-3.2# passwd -d rootRemoving password for user root.passwd: Successbash-3.2# exit[olpc@xo-10-CC-6F ~]$
Now the problem is that there is no way to print stuff from any of the Sugar activities. The best place toput in print support would be the Journal activity. Along the bottom where the mounted USB keys arelocated could be an icon for a printer, and dragging a file down to the printer ojbect could cause it tobe send to the printer.
The alternative is to write some scripts invocable from the Terminal activity to determine what isin the journal, and send them to LPR with the appropriate parameters.
I did not have time to do either of these, but perhaps someone out there can take on that as a project.
Continuing my week's theme on the XO laptop from the One Laptop Per Child [OLPC] project, I have been amused watching the OLPC forum discussion on the choiceof browser options available.
The built-in browser is simple but functional. It is full screen,with back, forward, and bookmark buttons, and an entry field forthe URL. This browser is fully integrated with the Sugar platform,files downloaded will appear in the journal. Download an Activity*.xo file, for example, and you can install it from the Journal.If you want to upload a file, click BROWSE on the website, and theJournal will pop up to choose files from.
Out of the box, the XO supports a minimal Flash that can handlesome Flash-based games but not YouTube videos, and does not supportJava.
The good folks of Opera have built a special edition for the XO laptop.However, some settings need to be changed to make the fonts large enoughto read.
Opera can be run as a Sugar activity, but this just launches a mothertask, which in turn launches a daughter task that actually runs thebrowser. This means that Home View will have two icons. The mothertask has an the Opera icon, but click on it and you get a grey screen.The daughter task appears as a grey circle, click on it and you get thebrowser screen. Alt-Tab will rotate through the Activities, so thegrey screen of the mother task is part of the rotation.
Although Opera has one foot on the Sugar platform, and one foot off,the lack of integration means poor interaction with the journal. The use of Opera is correctly registered. However, downloadingfiles requires a working knowledge of subdirectories, and uploading anythingrequires knowing what it is called, and where it is located. Not obviousfor many of the items created by Sugar applications.
The XO laptop is based on Redhat Fedora distribution, so I downloadedthe Firefox RPM package and installed this. To run, you need to startthe Terminal Activity, and then at the cursor type firefox.Journal only registers that the Terminal activity was used, but not anythingelse.
Since I run Firefox 2.0 on Windows XP, OS X and Linux, I am very familiarwith this browser, and it works as expected. Like Opera, there are shortcut keys, tabs for multiple pages, and optionsto add Java and Flash player. I was able to install add-onsfor Del.icio.us and FireFTP, and they worked as expected. Having accessto FTP sites will make development on the XO much easier.Again all files are uploaded/downloaded to directories, so some workingknowledge of where files are placed is required.
The fonts in Firefox did not expand/shrink as nicely as they had in Opera.Be careful not to select "View->
To close, you have to select File->Quit from the browser window, whichbrings you back to the Terminal activity, which you can then shutdown with Ctrl-Esc.
For now, I will keep all three and continue to evaluate them.I saw a few opportunities for improvement:
The Opera and Terminal icons are not on the first screen.You have to hit the right arrow to get to the "overflow" set of icons. Re-ordering the icons is simply a matter of editing the following file with "vi"(my first few lines I use are shown below):
Put the activities in the order you want. Any activity not listed willappear after these.
It might be possible to create a modified Terminal activity thatinvoked Firefox directly, to eliminate having to type it in each time.
Several people have expressed interest in a browser that runs entirely withthe Xo laptop folded over in eBook/Game mode, such that thekeyboard is completely covered up, exposing only the up-left-right-down arrowsand the Circle/Square/X/Check buttons.
Change the "News Reader" to invoke Bloglines instead. This might be yetanother modified Terminal activity, but borrow the icon from News.
Well, if you have further thoughts on these browsers, enter a comment below.
My XO laptop arrived Friday, December 21, this was from the [Give 1 Get 1 (G1G1)] program fromthe One Laptop Per Child (OLPC) foundation. The program continuesto the end of this month (December 31).
Here are my first impressions.
Setup was Easy
Open the box, put in battery, and plug in the adapter. Enter your name and choose your favorite color for your stick figurine. No passwords, no parameters. Software is pre-installed and ready to use.
The four pages of instructions included how to open the unit (not intuitive), where the various connection ports are located, what the home screen and neighborhood screen look like, safety warnings, and a nice letter from Nicholas Negroponte with an 800 phone number and website in case more help is needed.
Connecting to the internet was the first thing I did. The neighborhood screen shows all the Wi-Fi access points. It recognized mineand three others. I clicked on mine, entered my WEP key, and was connected.
This is a Linux operating system running the Sugar user interface.There are four screens:
Neighborhood - shows all Wi-Fi access points
Friends - shows all other XO laptops nearby, in my case I am all alone
Home - your stick figurine with all the applications you can choose from are represented as icons at the bottom, just like OS X on my Mac Mini, or the launchpad on my Windows XP. Left panel for clipboard items.
Application - Applications run in full-screen mode
Four buttons across the top allow you to jump to any screen instantly.Everything else is single left-click. No double-clicks or right-clicks.
A circle on the home screen designates which applications are running, and how much of the available 256MB RAM they are consuming. This makes it easy to seeif you can run more applications or need to shut something down. Youcan jump to any application, or shut it down, from this view.
Shutting down the XO is done by clicking your stick figurine,and choosing shutdown.
I fired up the browser. The default 'home page' offers some help offline, as well as links to online resources and a google search bar. The full-color 1200x900 is very easy to read. You can hit ctrl+plus to make the fonts bigger. In bright sunlight, the screen turns automatically to greyscale.The built-in browser is easy enough to use, with standard back, forward, re-load, and bookmark buttons. The URL entry field also shows the pages title. It doesn't have tabs to see multiple pages at the same time, but I was able to fire up a second instance of the browser, so thatI could alt-tab back and forth between the two web sites.
There are so many applications that they don't all fit on the bottom of the screen.Left and right tab buttons will display the next set. I don't know if it is possible to re-order the icons, but I can certainly see some applications appealing to different ages, and perhaps re-ordering them into age-specific groups might be helpful.
Basic applications include the Abiword word processor, a PDF viewer, a simple paint program, calculator, chat, and news RSS feed reader; TamTam music to play and edit compositions; and some learn-to-program-a-computer software including Pippy, Etoys, and TurtleArt.
The 'record' program lets you take 640x480 pictures with the built-in camera, up to 45 seconds of video and audio recording. The picture abovewas taken with my XO, and edited online using [snipshot.com]. Another program can be usedto make video calls to another computer, similar to Skype or IBM Lotus Sametime.
The XO has built-in microphone and speakers, but also microphone and speaker ports, as well as three USB ports, and a slot for an SD memory card.
The QWERTY keyboard is designed for small children hands, I found myself using my two index fingers in a hunt-and-peck style. People who use Blackberry's or other hand-held devices might be able to use their two thumbs instead. Also, I am not used to a touchpad as the pointing device. My other laptops have a red knob between the G/H/B keys that acts like a joystick. So, I decided to attach my Apple keyboard/mouse to one USB port, which allows me faster typing and better resolution with my mouse.
I also inserted a 1GB SD card into the slot. Getting to the SD slot was challenging--you have to rotate the screen 90 degrees so that the lower right corner is over the laptop handle. It appears I need to purchase some tweasers to get my SD card back out, so until then, it will remain there as permanent addition to my XO.
A terminal application provides a command line interface into Linux.
The 'vi' editor is installed, in case I need to make changes to fstab or anythingelse in my /etc directory.
There is no S-video or VGA port. However, a teacher could probably fold thislaptop up in e-book mode and lay it flat on an [overhead projector] since the screen can handle bright sunlight in black-and-white mode.
The Journal and the Clipboard
There are no folders or subdirectories here. The journal acts as your desktop, holding all the files you have referenced, sorted in chronological order with the most recent on top. The journal application is started automatically when you boot up.My SD card is shown as a separate entry at the bottom right corner, but I have access only to files on my top-level directory on the card. The journal allows you to drag and drop between the system and the SD flash card.The list can be filtered by file type and application, so finding things is easy.You can also copy anything in the journal to the clipboard, appearing on the leftpanel of the home screen. You can then launch or paste this into other applications.
Pressing Alt-1 takes a 1200x900 snapshot of the current screen, and puts it into the journal.On websites that allow you to upload a file, including GMAIL, snipshot.com, etc. the browse button brings up the journal. So, for example, you could take a snapshot of the current webpage or paint creation, and send it as an attachment to someone via GMAIL. Google has an XO-enabled version of GMAIL that you can download from the OLPC activities page.
This entire post, including the picture above, was done with the XO laptop itself. I am impressed with the thought that went into this design, and I see great potential here. The interface adequately hides the Linux operating system for those who just want to use the computer, but makes it readily accessible for those who want to learn more about the Linux operating system and computer programming.
Well, tomorrow is the Winter solstice, at least for those of us in the Northern hemisphere of the planet.As often happens, I have more vacation days left than I can physically take before they evaporateat the end of the year, so next week I will be off, going to see movies like the new["Golden Compass"]or perhaps read the latest book from [Richard Dawkins].
Next week, I suspect some of the kids on my block will be playing with radio-controlled cars orplanes. If you are not familiar with these, here's a [video on BoingBoing]that shows Carl Rankin's flying machines that he made out of household materials.
Which brings me to the thought of scalability. For the most part, the physics involvedwith cars, planes, trains or sailboats apply at the toy-size level as well as the real-world level. One human operator can drive/manage/sail one vehicle. While I have seen a chess master play seven opponents on seven chess boards concurrently, itwould be difficult for a single person to fly seven radio-controlled airplanes at the same time.
How can this concept be extended to IT administrators in the data center? They have to deal withhundreds of applications running on thousands of distributed servers.In a whitepaper titled [Single System Image (SSI)], the threeauthors write:
A single system image (SSI) is the property of a systemthat hides the heterogeneous and distributed nature of theavailable resources and presents them to users and applicationsas a single unified computing resource.
IBM has some offerings that can help towards this goal.
Even in the case where yourvehicle is being pulled by eight horses--(or eight reindeer?)--a single operator can manage it, holding the reins in both hands. In the same manner,IBM has spent a lot of investment and research into supercomputers, where hundreds of individualservers all work together towards a common task. The operator submits a math problem, for example,and the "system system image" takes care of the rest, dividing the work up into smaller chunksthat are executed on each machine.
When done with IBM mainframes, it is called a Parallel Sysplex. The world's largest business workloadsare processed by mainframes, and connecting several together and working in concert makes this possible.In this case, the tasks are typically just single transactions, no need to divide them up further, justbalance the workload across the various machines, with shared access to a common database and storageinfrastructure so they can all do the work equally.
Last August, in my post [Fundamental Changes for Green Data Centers], I mentioned that IBM consolidated 3900 Intel-based servers onto 33 mainframes. This not only saves lots of electricity, but makes it much easier for the IT administratorsto manage the environment.
Parallel Sysplex configurations often require thousands of disk volumes, which would have been quitea headache dealing with them individually. With DFSMS, IBM was able to create "storage groups" wherea few groups held the data. You might have reasons to separate some data from others, put them inseparate groups. An IT administrator could handle a handful of storage groups much easier than thousandsof disk volumes. As businesses grow, there would be more data in each storage group, but the numberof storage groups remains flat, so an IT administrator could manage the growth easily.
IBM System Storage SAN Volume Controller (SVC) is able to accomplish this for other distributed systems.All of the physical disk space assigned to an SVC cluster is placed into a handful of "managed diskgroups". As the system grows in capacity, more space is added to each managed disk group, but few IT administrators can continue to manage this easily.
The new IBM System Storage Virtual File Manager (VFM) is able to aggregate file systems into one globalname space, again simplifying heterogeneous resources into a single system image. End users have a singledrive letter or mount point to deal with, rather than many to connect to all the disparate systems.
Lastly we get to the actual management aspect of it all. Wouldn't it be nice if your entire data centercould be managed by a hand-held device with two joysticks and a couple of buttons? We're not quite there yet, but last October we announced the [IBM System Storage Productivity Center (SSPC)]. This is a master consolethat has a variety of software pre-installed to manage your IBM and non-IBM storage hardware, includingSAN fabric gear, disk arrays and even tape libraries. It lets the storage admin see the entire data centeras a single system image, displaying the topology in graphical view that can be drilled down using semanticzooming to look at or manage a particular device or component.
Customers are growing their storage capacity on average 60 percent per year. They could do this by havingmore and more things to deal with, and gripe about the complexity, or they can try to grow theirsingle system image bigger, with interfaces and technologies that allow the existing IT staff to manage.
As we wrap up the year, people's thoughts turn to archive anddata retention.
The [Robert Frances Group] have put out a research paper titled Optimizing Data Retention and Archiving - November 2007 that helps IT executives understand the cost differences for a disk-only archive approach versus disk/tape archive approach and how an [IBM System Storage DR550] offering can help address the long-term storage archive requirements with a world-class storage strategy that reduces cost, improves efficiency and supports compliance. Here is an excerpt:
Ongoing legal, audit, and regulatory requirementswill continue to drive IT groups to improvearchive policies, processes, strategy, andefficiency. The choice of which technologies touse will have a profound impact on the success ofsuch efforts, since technologies like the DR550embody many aspects of the strategy, processes,and policies that must be decided upon. When itcomes to tape, IBM's DR550 is unique inproviding that support. Competitors tout disk-onlysolutions as the wave of the future, but researchindicates otherwise. The most basic benefits arecost and mobility, and despite the various vendorproclamations to the contrary, tape is still only afraction of the cost of disk and will remain so inthe foreseeable future.
This paper is yet another nail in the coffin of EMC Centera.In his post [Anyone Naughty on Your List…], Jon W Toigo points to an eBay fire sale of an EMC Centera Gen 4.
There has never been a better time to switch from EMC Centera to theIBM System Storage DR550.
This is a reasonable question. Since Invista 2.0 came out months ago in August, and Invista 2.1 is rumored to be out by end of this month, why put out a press release now, rather than just wait a few weeks? Thesignificant part of this announcement was that EMC finally has their first customer reference.To be fair, getting a customer to agree to be a reference is difficult for any vendor. Some non-profitsand government agencies have rules against it, and some corporations just don't want to be bothered byjournalists, or take phone calls from other prospective customers. I suspect EMC wanted to put the good folks from Purdue University in front of the cameras and microphones before they:
In Moore's terminology, Purdue University would be a "technology enthusiast", interested in exploring the technologyof the EMC Invista. Universities by their very nature often see themselves as early adopters, willing to take big risks in hopes to reap big rewards. The chasm happens later, when there are a lot of early adopters, all willing to be reference accounts. The mainstream market--shown here as pragmatists, conservatives, and skeptics-- are unwillingto accept reference claims from early adopters, searching instead for moderate gains from minimal risks. They prefer references from customers that are similar in size and industry. Whether a vendor can get a product to cross this chasm is the focus of the book.
Why "SAN" virtualization?
Technically, Invista is "storage" virtualization, not "SAN" virtualization. Virtualizationis any technology that makes one set of resources look and feel like a different setof resources, preferably with more desirable characteristics. You can virtualizeservers, SANs, and storage resources.
Virtual SAN (VSAN) technology, supported bythe Cisco MDS 9500 Series Multilayer Director Switch, partitions a single physical SAN into multipleVSANs, allowing different business functions and requirements to share a common physical infrastructure.
How does Invista advance Cisco's VSAN functionality? It doesn't, but that doesn't makethe title a falsehood, or the press release by association full of lies.If you read the entire press release, EMCcorrectly states that Invista is "storage" virtualization. Some storagevirtualization products, like EMC Invista and IBM System Storage SAN Volume Controller (SVC), require a SAN as a platform for which to perform their magic.Marketing people might use the term "SAN" torefer not just the network gear that provides the plumbing, but also to include the storage devices that are attached to the SAN. In that light, theuse of "SAN virtualization" can be understood in the title.
More importantly, it appears that EMC no longer requires that you purchase new SAN equipment from themwith Invista. When the Invista first came out, it cost over a quarter-million US dollars to cover thecost of the intelligent switches, but with the price drop to $100K, I imagine this means theyassume everyone has an appropriately-supported intelligent switch already deployed.
Why this architecture?
In his post [Storage Virtualization and Invista 2.0], EMC blogger ChuckH does a fair job explaining why EMC went in this direction for Invista, and how it is different thanother storage virtualization products.
Most storage virtualization products are cache-based. The world's first disk storagevirtualization product, the IBM 3850 Mass Storage System, introduced in 1974, and thefirst tape virtualization product, the IBM 3494 Virtual tape Server, introduced in 1997, bothused disk cache in front of tape storage. Later virtualization products, like IBM SVC and HDS USP-V, use DRAM memory cache in front of disk storage, but the concept is the same.People are comfortable with cache-based solutions, because the technology is matureand well proven in the marketplace, and excited and delighted that these can offer the following features in a mixed heterogeneous disk environment:
instantaneous point-in-time copy
None of these features are provided by Invista, as there is no cache in the switch. Instead,Invista is a "packet cracker"; it cracks open each FCP packet, inspects and modifies the contents, then passes theFCP packet along to the appropriate storage device. This process slows down each read andwrite by some amount, perhaps 20 microseconds. The disadvantage of slowing down every readand write is offset by having other benefits, like non-disruptive data migration.
To compensate for Invista's inability to provide these features,EMC offers a second solution called EMC RecoverPoint, which is an in-band cache-based appliancesimilar in design to SVC, but maps all virtual disks one-to-one to physical disks. It offersremote distance asynchronous mirroring between heterogeneous devices.EMC supports RecoverPoint in front of Invista, but if you are considering buying bothto get the combined set of features, you might as well buy an IBM SVC or HDS USP-V instead,in one system, rather than two, which is much less complicated. IBM SVC and HDS USP-Vhave both "crossed the chasm" having sold thousands of units to every type and size of customer.
Hopefully, this answers the questions you might have about EMC Invista.
Last year, I covered Chris Anderson's book [The Long Tail]. This year, Chris Anderson, editor-in-chief of Wired.com, has an upcoming book titled Free, the past and future of a radical price. Chris talked about his book here at Nokia World 2007 conference, and the [46-minute video] is worth watching.He asks the big question "What if certain resources were free?" This could be electricity, bandwidth, or storage capacity. He explores how this changes the world, and createsopportunities for new business models. However, many people are stuck in a "scarcity" modeland treat nearly-free resources as expensive, and find themselves doing traditional things thatdon't work anymore. Chris mentions [Second Life] as aneconomy where many resources are free, and seeing how people respond to that.Rather than focusing on making money, new businesses are focused on gainingattention and building their reputation. Here are some example business models:
Cross-subsidy: give away the razors, sell the razor blades; or give away cell phones and sell minutes
Ad-Supported: magazines and newspapers sell for less than production costs
Freemium: 99% use the free version, but a handful pay extra for something more
Digital economics: give away digital music to promote concert tours
Free-sample marketing: give away samples to get word-of-mouth advertising
Gift economy: give people an opportunity and platform to contribute like Wikipedia
Nick Carr writes a post [Dominating the Cloud], indicatingthat IBM, Google, Microsoft, Yahoo and Amazon are the five computing giants to watch, as they are more efficient atconverting electricity into computing than anyone else. Last month, I mentioned IBM and Google partnership on cloud computing in my post[Innovationthat matters: cell phones and cloud computing].Nick's upcoming book titled[The Big Switch] looks into "Utility Computing",comparing the change of companies generating their own electricity to using an electric grid, to the recent developments of cloud computing and software as a service (SaaS). Amazon's latest "SimpleDB" online databaseis cited as an example.
Last, but not least, Seth Godin writes in his post [Meatballs and Permeability] about the bits-vs-atoms issue, what Chris Anderson above refers to as the new digital economy. The idea here is that value carried electronically as bits (digital documents, for example) have completely different economics than value carried as atoms (physical objects), andrequires new marketing techniques. Methods from traditional marketing will not be effective in this new age.Here is a [review] of Seth's new book Meatball Sundae: Is Your Marketing Out of Sync?
All three of these books seem to be covering the same phenomenon, just from different viewpoints. I lookforward to reading them.
While Bill Gates is personally benefiting from code he wrote 30 years ago, most software engineers don't getroyalties for their creative efforts. Robin Harris on StorageMojo has a great piece on [Why are the writers striking?]The writers in this case are those who write scripts for television programs. They get 4 cents for every$19.99 DVD sold today, and want this bumped up to 8 cents. More importantly, they want the same deal forcontent shown over the internet. Currently, they get nothing when content they wrote for is shown on the internet, and they would like that fixed also.
Paying royalties to creative writers encourages them to write good stuff. The best stuff will result in moreroyalties, and we want to encourage this. What about software engineers? Don't we want them to write the beststuff also? Shouldn't they get royalties too, not just a flat salary and continued employment?
From a technology-oriented to a service-oriented approach to IT management
Companies are challenged with shifting from a technology/resource-oriented to a service-oriented approach to IT management. This involves new processes, a new reportingstructure for the IT staff, new tools and technologies, and new data to be captured.A top-down approach is recommended for large organizations, but a bottom-up approachmight be easier to implement for small and medium sized businesses.
IT service management architecture and autonomic computing
IBM has been promoting the concept of Autonomic Computing since 2001. A self-managed resource can have an autonomic manager with sensor and effector. The sensor is used to monitor status, a knowledge basecan analyze and plan for appropriate modifications, and execute these through theeffector. The Autonomic Computing Reference Architecture (ACRA) aligns with the Information Technology Service Management (ITSM) model well, with the CMDB acting asthe knowledge base for the autonomic managers. See my earlier post[Self tuning guitars and storage].
Evolving standards for IT service management
Changes to the IT infrastructure must be closely managed to avoid disruptions.IT organizations recognize that standards-based solutions enable interoperability,with less risk, to connect internal and external applications. Standards can be formally developed by standards bodies like ISO, IETF, W3C, OASIS, and DMTF; or be de facto standards that become widely used by companies, which can then laterbe adopted by standards bodies. SML and SDD are emerging standards that are incompatible with the current set of Web Services-based protocols, like WSDM, but work isunderway to try to determine a unifying standard to support all of these under ITSM.
Prospects for simplifying ITSM-based management through self-managing resources
An ideal computing system would take over a great deal of its own management.Today's IT systems are brittle, difficult to understand, and dangerous to change.The savings from automating some tasks are dwarfed by the irreducible costs of humandecision making, agreements and approvals built in formal processes. A true self-managing, scalable IT system would consist of a number of nearly-identical boxes,with a web interface to define high-level policies and provide information on utilization and performance. As the system needs to expand, it can automatically place the order. When the new boxes arrive, they are placed and connectedinto the data center, and the system configures and provisions them appropriately.
IT Autopilot: A flexible IT service management and deliver platform for smalland medium business
Using an airplane analogy, the pilot performs manual steps to get the plane safelyoff the ground, then turns it over to the autopilot for normal operations. The ITAutopilot intends to do this for IT service management in small and medium business (SMB)that may not have a large dedicated IT staff, using an SOA approach that isloosely coupled, stateless, and adhering to Web Services standards. The IT Autopilotemploys workflow-based controls, the autonomic computing MAPE model, and customizedpolicies to address SMB requirements. It could be deployed as an appliance, similarto IBM System Storage Productivity Center.
I'm continuing my coverage of IBM Systems Journal's [fifteen articles about IBM Service Management].As storage hardware cost per GB declines 25 percent per year, the cost of labor has grown to nearly 70percent of the total IT budget. This brings new focus on how we do things, rather than what things siton the raised floor. Yesterday, my post summarized[the first five articles].Here is what I got out of the next five articles:
Integrated change and configuration management
IT Infrastructure Library (ITIL) best practice covers a variety of disciplines, including incident management,problem management, release management, service help desk, change management, and configuration management.IBM has combined the last two into a single database, and this paper provides insights gained fromimplementing these in practice. A special section talks about how service providers can support multipleclients that must be kept separate from each other.
The process of building a Process Manager: Architecture and design patterns
Business processes coordinate and sequence the work done by a collection of people.Most companies define their business process from scratch, and develop their own applicationsto support their implementation. Process Managers are "out of the box" applications that help customers integrateand automate more quickly than building from scratch. These Process Managers leverage and update informationabout configuration items (CIs) in the configuration management database (CMDB). One of the first developedby IBM was the IBM Tivoli Storage Process Manager.
Integration of domain-specific IT processes and tools in IBM Service Management
ITIL tells you what needs to get done, but it doesn't tell you exactly how to do it. Completing a simplechange request to the IT environment can have a drastic impact on service level agreements (SLAs), utilization of existing storage capacity, and business operations. Sometimes it is important to use multipleProcess Manager applications together. To accomplish this, it is important to launch and land in contextat the appropriate points for smooth transition.
Using a model-driven transformational approach and service-oriented architecture for service deliver management
Companies are considering outsourcing as a way to focus on core competencies. However, the trend is towardselective outsourcing, where the customer controls the IT solution architecture and retains their legacy tools.As a result, service providers inherit the business and IT processes from their clients. IBM Research has developed the model-driven business transformation (MDBT) method that choreographs workflow tools with humanactivities. A "balanced scorecard" allows both client and outsourcer monitor progress towards strategic goals.
Catalog-based service request management
Service providers (outsourcers) are able to bring the latest IT technology, best practices, and skilledservice delivery teams. Unfortunately, unique business processes from each client limits the ability to leveragethese resources effectively. A service delivery management platform (SDMP) catalog serves as a repositoryof atomic services and the delivery teams that can perform them. This allows outsourcers to leverage resourcesacross multiple clients, while still being able to tailor business compositions of these atomic services to an individual client's requirements.
The latest IBM Systems Journal has [fifteen articles about IBM Service Management], which includes the disciplines for managing storage resources as part of an overall IT data center.As with most journals, these articles are heavy academic efforts, not light summer reading.
However, since I have moved from marketing to consulting, I need to read these kinds of articles to keep up with the industry. I realize many people don't have time to read allof these, so over the next three days, I will give some quick highlights in hopefully more understandablelanguage. Here is what I got out of the first five articles:
An Overview of IBM Service Management
This 10-page article provides a good overview of what the other articles go into greater detail.The role of information has changed, from supporting back-office tasks like payroll andinventory, to enabling growth in the business itself, providing insight and competitive advantage. The challenges are summarized under "Four C's": Complexity, Change, Cost, and Compliance. The recommended approach is to engage with IBM,who has thousands of practitioners with years of experience in ITIL, eTOM, COBIT, CMMI and SOA.
Adding value to the IT organization with the Component Business Model
Many Service Level Agreements (SLAs) are loaded with technological jargon rather than concentratingon intended business results. CIOs must change this, and learn to run IT as a business witha service delivery focus.IBM Process Reference Model for IT (PRM-IT) is the foundationfor the Component Business Model for the Business of IT (CBMBoIT) that can assist with strategic decision making to transform IT into this new role.
An Integration model for organizing IT service management
There are so many ways to implement Information Technology Service Management (ITSM) that it is hard to tell if there are gaps or overlaps between products and offerings. A seamless solution requires common terminology and approaches. An integration model helps to bring all this together, focusedon being consistent with existing practice, with clarity of expression, and practical to implement.
IBM Service Management architecture
Today's systems management tools are fragmented by resource domain--servers are managed here, networksmanaged there, and storage is another story altogether. IBM Service Management intends to integratea portal-based User Interface, a process runtime layer, a configuration management database (CMDB), and all the various operational management products (OMPs) for each resource. For example, IBM TotalStorage ProductivityCenter is an OMP for IBM and non-IBM storage resources.
A configuration management database architecture in support of IBM Service Management
IBM Tivoli Change and Configuration Management Database (CCMDB) holds all the configuration data of IT resources in the data center, including individual "configuration items" (CIs), as well as tracks changes. The database is populated with data from different sources, includingautomatic discovery. Relationships between CIs provides a visual representation of application dependencies.The data model uses a clever combination of Unified Modeling Language (UML) with Java persistent objects.