This week is Thanksgiving holiday in the USA, so I thought a good theme would be things I am thankful for.
I'll start with saying that I am thankful EMC has finally announcedAtmos last week. This was the "Maui" part of the Hulk/Maui rumors we heard over a year ago. To quickly recap, Atmos is EMC's latest storage offeringfor global-scale storage intended for Web 2.0 and Digital Archive workloads. Atmos can be sold as just software, or combined with Infiniflex,EMC's bulk, high-density commodity disk storage systems. Atmos supports traditionalNFS/CIFS file-level access, as well as SOAP/REST object protocols.
I'm thankful for various reasons, here's a quick list:
- It's hard to compete against "vaporware"
Back in the 1990s, IBM was trying to sell its actual disk systems against StorageTek's rumored "Iceberg" project. It took StorageTek some four years to get this project out,but in the meantime, we were comparing actual versus possibility. The main feature iswhat we now call "Thin Provisioning". Ironically, StorageTek's offering was not commercially successful until IBM agreed to resell this as the IBM RAMAC Virtual Array (RVA).
Until last week, nobody knew the full extent of what EMC was going to deliver on the many Hulk/Maui theories. Severalhinted as to what it could have been, and I am glad to see that Atmos falls short of those rumored possibilities. This is not to say that Atmos can't reach its potential, and certainly some of the design is clever, such as offering native SOAP/REST access.
Instead, IBM now can compare Atmos/Infiniflex directly to the features and capabilities of IBM's Scale Out File Services [SoFS], which offers a global-scale multi-site namespace with policy-based data movement, IBM System Storage Multilevel Grid Access Manager[GAM] that manages geographical distrubuted information,and IBM [XIV Storage System] that offers high-density bulk storage.
- Web 2.0 and Digital Archive workloads justify new storage architectures
When I presented SoFS and XIV earlier this year, I mentioned they were designed forthe fast-growing Web 2.0 and Digital Archive workloads that were unique enough to justify their own storage architectures. One criticism was that SoFS appeared to duplicate what could be achieved with dozens of IBM N series NAS boxes connected with Virtual File Manager (VFM). Why invent a new offering with a new architecture?
With the Atmos announcement, EMC now agrees with IBM that the Web 2.0 and DigitalArchive workloads represent a unique enough "use case" to justify a new approach.
- New offerings for new workloads will not impact existing offerings for existing workloads
I find it amusing that EMC is quickly defending that Atmos will not eat into its DMXbusiness, which is exactly the FUD they threw out about IBM XIV versus DS8000 earlier this year. In reality, neither the DS8000 nor the DMX were used much for Web 2.0 andDigital Archive workloads in the past. Companies like Google, Amazon and others hadto either build their own from piece parts, or use low-cost midrange disk systems.
Rather, the DS8000 and DMX can now focus on the workloads they were designed for,such as database applications on mainframe servers.
- Cloud-Oriented Storage (COS)
Just when you thought we had enough terminology already, EMC introduces yet another three-letter acronym [TLA]. Kudos to EMC for coining phrases to help move newconcepts forward.
Now, when an RFP asks for Cloud-oriented storage, I am thankful this phrase will help serve as a trigger for IBM to lead with SoFS and XIV storage offerings.
- Digital archives are different than Compliance Archives
EMC was also quick to point out that object-storage Atmos was different from theirobject-storage EMC Centera. The former being for "digital archives" and the latter for"compliance archives". Different workloads, Different use cases, different offerings.
Ever since IBM introduced its [IBM System Storage DR550] several years ago, EMC Centera has been playing catch-up to match IBM'smany features and capabilities. I am thankful the Centera team was probably too busy to incorporate Atmos capabilities, so it was easier to make Atmos a separate offering altogether. This allows the IBM DR550 to continue to compete against Centera's existingfeature set.
- Micro-RAID arrays, logical file and object-level replication
I am thankful that one of the Atmos policy-based feature is replicating individualobjects, rather than LUN-based replication and protection. SoFS supports this forlogical files regardless of their LUN placement, GAM supports replication of files and medical images across geographical sites in the grid, and the XIV supports this for 1MBchunks regardless of their hard disk drive placement. The 1MB chunk size was basedon the average object size from established Web 2.0 and DigitalArchive workloads.
I tried to explain the RAID-X capability of the XIV back in January, under muchcriticism that replication should only be done at the LUN level. I amthankful that Marc Farley on StorageRap coined the phrase[Micro-RAID array] to helpmove this new concept further. Now, file-level, object-level and chunk-level replication can be considered mainstream.
- Much larger minimum capacity increments
The original XIV in January was 51TB capacity per rack, and this went up to 79TB per rack for the most recent IBM XIV Release 2 model. Several complained that nobody would purchase disk systems at such increments. Certainly, small and medium size businessesmay not consider XIV for that reason.
I am thankful Atmos offers 120TB, 240TB and 360TB sizes. The companies that purchasedisk for Web 2.0 and Digital Archive workloads do purchase disk capacity in these large sizes. Service providers add capacity to the "Cloud" to support many of theirend-clients, and so purchasing disk capacity to rent back out represents revenue generating opportunity.
- Renewed attention on SOAP and REST protocols
IBM and Microsoft have been pushing SOA and Web Services for quite some time now.REST, which stands for [Representational State Transfer] allows static and dynamic HTML message passing over standard HTTP.SOAP, which was originally [Simple Object Access Protocol], and then later renamed to "Service Oriented Architecture Protocol", takes this one step further, allowingdifferent applications to send "envelopes" containing messages and data betweenapplications using HTTP, RPC, SMTP and a variety of other underlying protocols.Typically, these messages are simple text surrounded by XML tags, easily stored asfiles, or rows in databases, and served up by SOAP nodes as needed.
- It's hard to show leadership until there are followers
IBM's leadership sometimes goes unnoticed until followerscreate "me, too!" offerings or establish similar business strategies. IBM's leadership in Cloud and Grid computing is no exception.Atmos is the latest me-too product offering in this space, trying pretty muchto address the same challenges that SoFS and XIV were designed for.
So, perhaps EMC is thankful that IBM has already paved the way, breaking throughthe ice on their behalf. I am thankful that perhaps I won't have to deal with as much FUD about SoFS, GAM and XIV anymore.
technorati tags: IBM, SoFS, XIV, GAM, DS8000, EMC, Atmos, Hulk, Maui, Infiniflex, STK, StorageTek, Iceberg, RVA, thin provisioning, VFM, SOAP, REST, DMX, RAID-X, Micro-RAID
In his post on Rough Type
titled ["McKinsey surveys the new software landscape"
], Nick Carr discusses the growing acceptance in the marketplace for Software-as-a-Service, or SaaS.He summarizes the results of McKinsey's recent[Enterprise Software Customer Survey 2008].IBM is already well established as part of the Web 2.0 Big "5" (the other four are Google, Yahoo, Amazon and Microsoft), so it may not be much surprise that it introduced some new offerings focused on this emerging market.
Whether you are looking to contract out for SaaS, or to provide a service to others over the cloud, IBM can help!
- Managed Hosting
For managed hosting, [IBM Managed Storage Services] hasbeen extended to support archive data through its entire lifecycle: supporting access, migration, non-erasablenon-rewriteable (NENR) protection, and expiration/destruction. This offering supports locating the storage onthe customer premises, a hosting center, or an IBM Service Deliver Center. IBM's blended disk and tape approachprovides a better alignment between information value and storage costs.
- Application-Led Service
Last December IBM acquired Arsenal Digital, which offers a remote "Enterprise Email Archive" service, supporting retention policies that can apply per user,per group, or even my message, as needed. This service provides fast user access to email archives, as well as e-discovery search. The search is not just for the email body text, but supports over 370different attachment types as well. Deduplication technology is used to reduce the actual amount of storage needed by 80percent. All of this with the security and comfort of knowing that these email archives are encrypted and protected in a disaster recovery class datacenter managed by IBM.Blocks and Files presents their thoughts on this in the article["IBM storing data and mail in the cloud"].
The Radicati Group has published some interesting statistics about email archive in[Volume 4, Issue 3]. Here's an excerpt:
- "In 2007, a typical corporate email account receives about 18 MB of data per day. This number is expected to grow to over 28 MB by 2011. Today, there is no way to effectively manage these messages, but with the help of an archiving solution.
- Today, the worldwide percentage of corporate mailboxes protected by archiving solutions is estimated to be around 14%, however it is growing at a fast pace, and is expected to reach over 70% by 2011.
- A survey of 102 corporate organizations worldwide, showed that 68% of large businesses view compliance as their top security concern in 2007."
- Cloud Computing
For those who are actually providing these services to others, over the cloud, then you might want to use the new[IBM System x iDataPlex].Compared to traditional server environments, the iDataPlex provides five times the computing power by doubling the number of servers per rack, but with 40 percent less energy consumption. Thanks to clevercooling technology, the system can run in standard office "room temperature" environments. You cancustomize with a mix of compute, network and storage nodes to meet your application requirements.In addition to Web 2.0 and SaaS workloads, the iDataPlex can be useful for financial risk analysis,high performance computing, and even batch processing.
technorati tags: Rough Type, Nick Carr, McKinsey, SaaS, Google, Yahoo, Amazon, Microsoft, managed hosting, storage services, NENR, archive, IBM, Service Delivery Center, Arsenal Digital, deduplication, Radicati Group, iDataPlex, Web2.0[Read More]
Well, today is April 1, and I just love [April Fools' Day
].This day has a rich history of practical jokes. Those not familiar can review this list of [Top 100 pranks and hoaxes
Tim Ferris started the festivities with [The Grand Illusion: The Real Tim Ferriss speaks]. He claimed that for the past year, he outsourced the writing of his blog to a writer from India, and an editor from the Philippines. Given that his post was dated March 31, and he writes frequently about the benefits of outsourcing, it appeared like a legitimate post. However, Tim fessed up the following day, claiming that it was April 1 in Japan where he wrote it.
Guy Kawasaki wrote[April Fools' Stories You Shouldn't Believe]including my favorite #12 "Ruby on Rails cited Twitter as the centerpiece of its new 'Rails Can Scale' marketing program." Speaking of Twitter, Fellow IBM blogger Alan Lepofsky from our Lotus Notes team wrote[Great, now there is Twitter Spam]. It looked like a real post, but then I realized, ... everything on Twitter is spam!
Topics like energy consumption and global warming were fodder for posts and pranks.The post[Was Earth Hour a joke again?], argued thatthe preparation of "Earth Hour" last week in effect used up more energy than the hour of this annual "lights-off event" actually saved. This reminded me of John Tierney's piece in the New York Times ["How virtuous is Ed Begley, Jr.?"] where a scientist explains that it is more "green" for the environment to drive a car short distances than to walk:
If you walk 1.5 miles, Mr. Goodall calculates, and replace those calories by drinking about a cup of milk, the greenhouse emissions connected with that milk (like methane from the dairy farm and carbon dioxide from the delivery truck) are just about equal to the emissions from a typical car making the same trip. And if there were two of you making the trip, then the car would definitely be the more planet-friendly way to go.
Wayan Vota, my buddy over at OLPCnews, writes in his post[Windows XO Child Centric Development] that the "Sugar" operating environment on the innovative Linux-based XO laptops will soon be re-named the"Windows XO Operating System", with their new motto "Windows XO: A Child-Centric Operating Platform for Learning, Expression and Exploration." The mocked up photo of an XO laptop with the Windows XO logo was excellent!
Gretchen Rubin reminds us that this is a great day to play tricks on your kids in[How April Fool’s day can be a source of happiness], and last week, Kai Ryssdal on NPR Radio investigated if [Mind Habits] was [a video game that's good for you?]This claims that just playing five minutes per day can reduce stress. I haven't been able to stop playing after five minutes, Mind Habits is like the proverbial potato chip, you can't just eat one!
The economists from Freakonomics explain in [And While You're at it, Toss the Nickel] that it costs the US Government 1.7 cents to produce each penny. The US government loses $50 million dollars each year making pennies. Each nickel costs 10 cents to produce. This one was dated March 31, so it could actually be true. Sad, but true.
My favorite, however, was EMC blogger Barry Burke's post["5773 > c"] explaining howtheir scientists were able to reduce latency on the EMC SRDF disk replication capability:
What the de-dupe team found is that there is a hidden feature within recent generations of this chip that allow a single bit, under certain circumstances, to represent TWO bits of information.
Still, almost 34% of the total bits transferred were in fact aligned double-zeros, far more than all other bit combinations - and most importantly, these were quite frequently byte-aligned, as required by this new-found capability. Makes sense, if you think about it - most of those 32- and 64-bit integers are used to store numbers that are relatively small (years, months, days, credit charges, account balances, etc.). So that's why the team decided to use this new two-fer bit to represent "00".
Mathematically, if you can transmit 34% of the data using half as many bits, you reduce the number of bits you have to transfer in total by 17%. Which, while not necessarily earth-shattering, is nothing to be ashamed of. On top of the SRDF performance enhancements delivered in 5772 (30% reduction in latency or 2x the distance), this new enhancement adds another 17% latency improvement (or ~1.4x more distance at the same latency). Combined with 5772, SRDF/S customers could see a 50% reduction in latency. And 5773 allows SRDF/A cycle times to be set below 5 seconds (with RPQ) - this new feature adds a little headroom to maximize bandwidth efficiency for the shortest possible RPO.
Again, this looked real, until I did the math. Start with the speed of light in a vacuum of space ("c" in BarryB's title) which is roughly 300,000 kilometers per second, or put into more understandable units, 300 kilometers per millisecond. However, light travels slower through all other materials, and for fiber optic glass it is only 200 kilometers per millisecond. Sending a block of data across 100km, and then getting a response back that it arrived safely, is a total round-trip distance of 200km, so roughly 1 millisecond. However, EMC SRDF often takes two or three round-trips per write, versus IBM Metro Mirror on the IBM System Storage DS8000 which has got this down to a single round-trip. The number of round-trips has a much bigger effect on latency than EMC's double-bit data compression technique. With IBM, you only experience about 1 millisecond latency per write for every 100km distance between locations, the shortest latency in the industry.
It is good that once a year, you should be skeptical of what you read in the blogosphere, and sometimes check the facts!
technorati tags: April Fools Day, Tim Ferris, 4HWW, outsourcing, Guy Kawasaki, Ruby on Rails, Twitter, Alan Lepofsky, Lotus, Notes, Earth Hour, spam, John Tierney, Ed Begley Jr., milk, carbon dioxide, Wayan Vota, OLPCnews, Windows XO, Gretchen Rubin, Kai Ryssdal, Freakonomics, NPR, Mind Habits, penny, nickel, EMC, BarryB, SRDF, IBM, DS8000, Metro Mirror, latency, fiber optic, speed of light
I got some interesting queries about IBM's Scale-Out File Services [SoFS
] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting
]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
- Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
- Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
- Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
technorati tags: IBM, SoFS, Acopia, VFM, Brocade, ILM, global namespace, clustered, file system, disk, tape, storage, system, CIFS, NFS, NAS, NTFS, ACL, DFS, AFS, Transarc, ASC Purple, DS3200, SAS, FC, FCP, DS8300, Turbo, DCS9550, SVC, FATA, SATA, nodes, backup, restore, recovery, Blue Cloud, cloud computing, PolyServe, HDS, BlueArc, HP, Sun, CFS, Lustre, Isilon, Exanet, EMC, Rainfinity, Hulk, Maui
Last week, I covered backup issues in [Deduplicationversus Best Practice for Backups
]. This week, I thought I would cover issues with email.
At IBM, our standard is to have a limit of 200MB per user mailbox. A few of us get exceptions and have up to500MB limit because of the work we do. By comparison, my personal Gmail account is now up to 6500MB. Whenthis limit is exceeded, you are unable to send out any mail until it is brought down below the limit, and a request to be "re-enabled for send" is approved, a situation we call "mail jail".
The biggest culprit are attachments. Only 10 percent of emails have attachments, but those that do take up 90percent of the total space! People attach a 15MB presentation or document, and copy the world ondistribution list. Everyone saves their notes with these attachments, and soon, the limits are blown. Not surprisingly, deduplication has been cited as a "killer app" to address email storage, exactly for this reason.If all the users have their mailboxes all stored on the same deduplication storage device, it might find theseduplicate blocks, and manage to reduce the space consumed.
A better practice would be to avoid this in the first place. Here are the techniques I use instead:
- Point to the document in a database
We are heavy users of Lotus Notes databases. These can be encrypted and controlled with Access Control Lists (ACL)that determine who can create or read documents in each database. Annually, all the database ACLs are validatedso that people can confirm that they continue to have a need-to-know for the documents in each database. Sendinga confidential document as a "document link" to a database entry takes only a few bytes, and all the recipientsthat are already on the ACL have access to that document.
- Point to the document on a web page
If the document is available on an internal or external website, just send the URL instead of attaching the file.Again, this takes only a few bytes. We have websites accessible only to all internal employees, websites thatcan be accessed only by a subset of employees with special permissions and credentials based on their job role, and websites that are accessible to our IBM Business Partners.
In my case, if I happen to have a blog posting that answers a question or helps illustrate an idea, I will sendthe "permalink" URL of that blog post in my email.
- Point to the document on shared NAS file system
Internally, IBM uses a "Global Storage Architecture" (GSA) based on IBM's Scale-Out File Services [SoFS] with everyone getting initially 10GB of disk space to store files, with the option to request more if needed. The system has policy-based support for placing and migrating older data to tape to reduce actual disk usage, and combines a clustered file system with a global name space.
My SoFS space is now up to 25GB, and I store a lot of presentationsand whitepapers that are useful to others. A URL with "ftp://" or "http://" is all you need to point to a filein this manner, and greatly reduces the need for attachments. I can map my space as "Drive X:" on my Windows system,or as a NFS mount point on my Linux system, which allows me to easily drag files back and forth.
Departments that don't need to offer "worldwide access" use NAS boxes instead, such as the IBM System Storage N series.
Pointing to files in a shared space, rather than as attachments in email, may take some getting used to. I've hada few recipients send me requests such as "can you send that as an attachment (not a URL)" because they plan toread it on the airplane or train, where they won't have online connectivity.
This all relates to new ways for employees to collaborate. Shawn from Anecdote writes in the post[Fostering a Collaboration Culture]:
"Have you invested in the latest and greatest in collaboration technology but still feel people are still not collaborating? How many Microsoft Sharepoint servers and IBM Quickplaces remain relatively untouched or only used by the organization's technorati? I think it's a big problem because this narrow view of collaboration starts to get the concept a bad name: "yeah, we did collaboration but no one used it." And then there the issue of the vast amount of money wasted and opportunities lost. We can't afford to loose faith in collaboration because the external environment is moving in a direction that mandates we collaborate. The problems we face now and into the future will only increase in complexity and it will require teams of people within and across organizations to solve them."
Well, sending pointers instead of attachments works for me, and has kept me out of "mail jail" for quite some timenow.
technorati tags: IBM, deduplication, email, mailbox, Gmail, attachment, Lotus, Notes, database, URL, Permalink, GSA, NAS, SoFS, disk, Anecdote
In addition to creating the Dilbert cartoon, Scott Adams has a blog, which sometimes is quite serious,and other times quite funny. The anticipated 30x cost of "Flash Drives" for Enterprise disk systems reminded meof one of Scott's articles from November 2007 titled [Urge to Simplify
].Here's an excerpt:
Now the casinos have people trained, like chickens hoping for pellets, to take money from one machine (the ATM), carry it across a room and deposit in another machine (the slot machine). I believe B.F. Skinner would agree with me that there is room for even more efficiency: The ATM and the slot machine need to be the same machine.
The casinos lose a lot of money waiting for the portly gamblers with respiratory issues to waddle from the ATM to the slot machines. A better solution would be for the losers, euphemistically called “players,” to stand at the ATM and watch their funds be transferred to the hotel, while hoping to somehow “win.” The ATM could be redesigned to blink and make exciting sounds, so it seems less like robbery.
I’m sure this is in the five-year plan. Longer term, people will be trained to set up automatic transfers from their banks to the casinos. People will just fly to Vegas, wander around on the tarmac while the casino drains their bank accounts, then board the plane and fly home. The airlines are already in on this concept, and stopped feeding you sandwiches a while ago.
Perhaps EMC can redesign its DMX-4 to "blink and make exciting sounds" as well. The Flash Drives were designedfor the financial services industry, so those disk systems could be directly connected to make transfers between the appropriate bank accounts.
technorati tags: Scott Adams, Dilbert, B.F. Skinner, ATM, casinos, EMC, DMX-4
I'm here at the Los Angeles airport on my way to Canada.
On my post last week[My Blook is Now Available],Cheryl Hagedorn comments:
I've just posted about your blook at Blooking Central http://blooking.blogspot.com/2007/11/inside-system-storage.html
I'll love to hear from you (I post letters from authors!) about how you put the blook together. Many folks have used cut and paste from blog page into word processor. Others have simply backed up their blogs, then cut and pasted. Some folks had the foresight to compose their posts in a word processor before posting!
Anyway, I'd like to know whatever ins and outs you'd like to share. Thanks.
Well Cheryl, I couldn't find any email address to send you a response, so Idecided to post here instead and post a traceback on your blog.
After learning about the Blooker Prize, I had asked our IBM Developerworks team if anyone else within IBM had published a blook, but nobody had heard of anything, so I had to look elsewhere.I got a lot of guidance from Lulu's [Book Publishing FAQs], and Don Campbell's[Five Steps to Publishing Your Paperback Book at Lulu],and how-to articles over at [bookcatcher.com].
- Decision 1: Defining the Container
Before you can cut-and-paste anything, you need a container file to put it in. Here were my key decisions:
- Page Size: Novel 6"x9" (15cm x 23cm) to support both perfect-bound paperback and dust-jacket hardcopy editions
- Colors: Full-color covers with black-and-white interior
- Fonts: 10pt Book Antiqua for the text, Courier for the monospaced computer examples,8pt for the "copyright" fine print
- Format: *.doc Microsoft Word file, using [Lulu's ready-to-use templates]
- Software: Office 2003 version of Microsoft Word on Windows XP system
- Front matter: Title, Copyright, Dedication, Table of Contents, Foreword, Introduction
- Back matter: Blog Roll, Blogging Guidelines, Glossary, Reference table, What people have written about me and my blog
According to Lulu, you could use OpenOffice instead with RTF files. I didn't try that. I did tryusing CutePDF to upload ready-made PDFs, that didn't work. I also tried saving text in PDF formaton my Mac Mini running OS X 10.4 Tiger, but Lulu didn't like that either.IBM now offers a free download of [LotusSymphony] that might be an alternative for my next book.
For my blook, the "Blog Roll" serves instead of a more formal [Bibliography]. I could have also includedonline magazines and other web resources.
- Decision 2: Chapter Configuration
I reviewed other blooks to see how they were organized. I thought I might organize the blog posts by topic or category, but all the blooks I looked atwere strictly chronological, oldest post first. This of course is exactly opposite as theyappear on the web browser. I decided to keep things simple, with just 12 chapters, one for each calendar month.
Each chapter was separated by a section break with unique footers, starting on odd page number. The footers have the page numbers on the outside edges, so that even pages had numbers on the left, and odd pages on the right. I also added the name of the chapter and the book, like so:
40 ................December 2006| |Inside System Storage.... 41
This was a lot of work, but makes the book look more "professional".
- Decision 3: Cut-and-Paste
People have asked me why it took three months to put my blook together, and I explainedthat the cut-and-paste process was manually intensive. My posts are either HTML entereddirectly into Roller webLogger, or typed in HTML on Windows Notepad and cut-and-pastedover to Roller later. I have access to the HTML source of each post, as wellas how it appears on the webpage, and tried cut-and-paste both ways. Copying theHTML source meant having to edit out all the HTML tags. I hadn't even looked into the idea of "backing up" through Roller all the entries, but they would probably have been HTMLsource as well.
In turned out that copying the webpage directly from the browser was better, which retains more of the formatting,and automatically eliminates all of the pesky HTML tags. I wanted the printed versions to resemblethe web page version.
Microsoft Word indicates all hyperlinks as bright blue underlined text which I didn't like, so I removedall hyperlinks, to avoid having to pay extra for "colored pages". This can be done manually, one by one, or pasting with the "text only" option butthis removes out all the other formatting as well. (Specifying black-and-white interior on Lulu might have converted all of these automaticallyto greyscale, so I might have been safe to leave them in,which I probably could have done if I wanted an online e-book version with links active, ... oh well)
To indicate where the hyperlinks would have been, I wrapped all the linked text in[square brackets]. I have now gotten in the habit of doing this for future blog posts, soif I ever make another book, it will cut down the work and effort on the cut-and-paste.
Some of the items I linked to posed a problem. I had to convert YouTube videos to flat imagesof the first frame to include them into the book. Older links were broken, and I had tofind the original graphics. I also sent a note to Scott Adams related about the use of one of his Dilbert cartoons.
I decided to also cut-and-paste my technorati tags and comments. For comments I mademyself, I labeled them "Addition" or "Response". A few people did not realize thatI was "az990tony" making the comments as the blog author, so I changed all to say "az990tony (Tony Pearson)" to make this more clear, and now do this on all future blogposts to minimize the work for my next book.
Because I used a lot of technical terms and acronyms, Microsoft Word actually gave mean error message that there were so many gramattical and spelling errors that it wasunable to track them all, and would no longer put wavy green or red lines underneath.
I did all the cut-and-paste work myself, but since the website is publicly accessible,I could have gotten someone else to do this for me.Had I read Timothy Ferriss' book The Four Hour Work Week sooner,I might have taken his advice on [Outsourcing the project to someone in India]. I might consider doing this for my next book.
- Decision 4: Numbering the Posts
I decided I wanted to standardize the title of each post. The date was not uniqueenough, as there were days that I made multiple posts. So, I decided to assign eacha unique number, from 001 to 165, like so:
2006 Dec 12 - The Dilemma over future storage formats (033)
Posts that referred back to one of my earlier posts within the book had (#nnn) added so that readers couldgo jump back to them if they were interested. This eliminated trying to keep track of pagenumbers.
- Decision 5: Adding behind-the-scenes commentary
- One of the reasons I rent or buy DVDs is for the director's audio commentary and deleted scenes. These extras provided that added-value over what I saw in the movietheatre. Likewise, 80 percent of a blook is already out in the public for reading, so I felt I needed to provide some added value. At the beginning of each month, I describewhat is going on behind the scenes, and then in front of specific posts, I providedadditional context. This could be context of what was going on in the blogosphere at thetime, announcements or acquisitions that happened, what country I was blogging from, orwhat unannounced products or projects that were being developed that I can now talk aboutsince they are now announced and available.
To distinguish these side comments from the rest of the blog posts,I decorated them with graphics. Searching for copyright-free/royalty-free clip-art, graphics, and photos that represented eachconcept was time-consuming. I shrunk each down to about 1 inch square in size, and changed themfrom color to greyscale. (LuLu conversion to PDF probably would have automaticallyconverted the color graphics to greyscale for me, in which case leaving them in full colormight have been nice for an e-book edition, ... oh well)
I did complete each chapter one at a time. So, for each month, I cut-and-pasted all the blog posts,tags and comments, then fixed up and numbered all the post titles, then added all the behindthe scenes commentary, and cleaned up all the font styles and sizes. I recommend you do this at least for the first chapter, so you can get a good feel for what the finished version will look like.
- Decision 6: Adding a Glossary
I sent early copies of the books to five of my coworkers knowledgeable about storage, andfive local friends who know nothing about storage.
Some of my early reviewers suggested having an index, so that people can find a specific poston a particular topic. Others suggested I spell out all the acronyms that appear everywhereand put that into the Reference section, rather than on each and every occurrence inthe book itself. Both were good ideas, and my IBM colleague Mike Stanek suggested calling ita GOAT (Glossary of Acronyms and Terms). Acronyms are spelled out, and terms or phrasesthat need additional explanation have a glossary definition. For eachitem, I put the post or posts that uses that term. Some terms are covered in dozens ofposts, so I tried to pick five or fewer posts representing the most pertinent.
The glossary was far more time-consuming than I first imagined, with over 50 pages containingover 900 entries. I struggled deciding which terms and acronyms needed explanation, and which were obvious enough. On the good side, itforced me to read and re-read the entire book cover to cover, and I caught a lot of othermistakes, misspellings, and formatting errors that way. Also, I have a large internationalreadership on my blog, so the glossary will help those whose English is not their native language,and will help those readers who are not necessarily experts in the storage industry.
- Decision 7: Designing the Covers
Up to this point, I had been printing early drafts with simple solid color covers. Lulu hasthree choices for covers:
- Just type in the text, upload an "author's photo" and chose a background color or pattern
- Upload PNG files, one for the front cover, one for the back cover, and chose the textand color of the spine.
- Upload a single one-piece PDF file that wraps around the entire book.
I had no software to generate the PDF for the third option, so I decided to try the secondoption. My first attempt was to format the front title page in WORD, capture the screen,convert to PNG and upload it as the front cover. I did same for the back cover, with a smallpicture of me and some paragraphs about the book.
I chose a simple straightforward title on purpose. Thousands of IBM and other IT marketing and technicalpeople will be ordering this book, and submitting their expenses for reimbursement as work-related, and didn't want to cause problems with a cute title like "An Engineer in Marketing La-La Land".
The next step was to use [the GIMP] GNU image manipulationprogram, similar to PhotoShop, to add a cream colored background, a slanted green spine, and some graphics that we had developed professionally for some of our IBM presentations.I learned how to use the GIMP when making tee-shirts and coffee mugs for our [Second Life] events, so I was already familiar. For newblook authors, I suggest they learn how to use this for their covers, or find someone who can do thisfor them.
I did the paperback version first, and once done, it was easy to use the same PNG files forthe dust jacket of the hardcover edition, adding some extra words for the front and back flaps.
The adage "Don't judge a book by its cover" seems to apply to everything except booksthemselves. The book cover is the first impression online, and in a bookstore. I have seenpeople pick books up off the shelf at my local Barnes & Noble, read the front and back covers, peruse the front and backflaps, and make a purchase decision without ever flipping a single page of the contents inside.From an article on Book Catcher [SELF-PUBLISHING BOOK PRODUCTION & MARKETING MISTAKES TO AVOID]:
According to selfpublishingresources website, three-fourths of 300 booksellers surveyed (half from independent bookstores and half from chains) identified the look and design of the book cover as the most important component of the entire book. All agreed that the jacket is the prime real estate for promoting a book.
While many struggle to find the right title and cover art, I think it is interesting that Lululets you post the same book with slightly different titles and covers, each as separate projects, and let market forces decide which one people like best. This is a common practice among marketresearch firms.
- Decision 8: Finding someone to write the Foreword
With the book nearly done, I thought it would be a nice touch to have an IBM executive write a Foreword at the frontof the book. Several turned me down, so I am glad I found a prominent Worldwide IBM executiveto do it. I should have started this process sooner, as she wanted to read my book in its entirety beforeputting pen to paper. I had not planned for this. I was hoping to be done by end of October,but waiting for her to finish writing the Foreword added some extra weeks. Next time,I will start this process sooner.
- Decision 9: Printing Early Drafts
You need to have Lulu print at least one copy to review before making it available to the public,and it doesn't hurt to order a few intermediary draft copies to make sure everything looks right.However, from the time I order it on Lulu, to the time it is in my hands, is over two weeks withstandard shipping, so I needed a way to print drafts to look at in between.
To avoid wear-and-tear on my color ink-jet printer, I went and bought a large black-and-white[Brother HL-5250DN] laser printer. Rather than buying specialty 6x9 paper, I used standard 8.5x11 paperusing the following 2-up duplex method:
- Upload the DOC file to Lulu, and get it converted to PDF
- Download the resulting PDF from Lulu back to your computer
- View the PDF in Adobe Reader, and print it using 2-up "Booklet" mode.
For example, if you print 60 pages in booklet mode, it prints two mini-pages on thefront side, and two more mini-pages on the back side of each sheet of paper, resulting in 15 standard 8.5" x 11" pages that can be folded, stapled, and read like a mini-booklet. My entire blook could be printed on seven of these mini-booklets, saving paper, and giving me a close approximation to what the final book would look like. Eachmini-page is 5.5"x8.5", so just slightly smaller than the final 6"x9" form factor.I fount that 60 pages/15 sheets was about the maximum before it becomes hard to fold in half.
So, if I had to do it all over again, I might have chosen 11pt Garamond (the default), or changedthe default to 11pt Book Antiqua up front, so as not to have spend so much time converting thefonts. I might have left out the glossary. I might have left in all the hyperlinks and graphicsin full color for a separate e-book edition. And I definitely would have looked for an author formy Foreword much earlier in the process.
I didn't plan to write a blook when I started blogging. I have started putting [square brackets]around all my links. I have started putting "az990tony (Tony Pearson)" on all my comments. I hadassumed that people were jumping to all the links I provided in context, but I learned that the blogpost has to stand on its own, so now I make sure that I either paraphrase the important parts, oractually quote the text that I feel is important, so that the blog post makes sense on its own.This is perhaps good advice in general, but even more important if you plan to write a blook later.
Lastly, I decided up front to write blog posts that were 500-700 words long, about the average lengthof magazine or newspaper articles. In my blook, the average is 639 words per post, so I hit thatgoal. I have seen some blogs where each post is just a few sentences. Maybe they are posting fromtheir cell phone, or don't have time to think out a full thought, but who wants to read a year'sworth of [twitter] entries.
Well Cheryl, I hope that helps. If you need anymore, click on the "email" box on the right panel.
technorati tags: Cheryl Hagedorn, Blooking Central, Lulu, Don Campbell, IBM, Developerworks, Book Antiqua, Courier, Garamond, Microsoft, Word, OpenOffice, Lotus, Symphony, PDF, CutePDF, OS X, HTML, Hyperlinks, blook, reference, glossary, Twitter, Timothy Ferriss, fourhourworkweek, outsourcing, India
Continuing my week's theme on Innovations that matter, I thought I would tackle energy efficiency and the recent excitement over the Smart car.
USA Today had an article [America crazy about breadbox on wheels called Smart car]. This car weighs only 2400 pounds, gets a respectable 33 MPG City,and 40 MPG Highway, with a list price of $11,590 US dollars. These have been in Europe for some time now.The "Smart" name comes from combining the S from Swatch, the M from Mercedes and ART. The car was designed byNicholas Hayek, founder of the SWATCH wristwatch line, and manufactured by Daimler, who also makes Mercedes cars.
We have many communities here in Tucson that people drive street-legal golf carts. People don't realize but bothelectric and electric/gas hybrid golf carts have been around for a long time. Some of the nicer golf carts run forabout $7,000 US dollars, with a shelf on the back that can hold two sets of golf clubs, or groceries.Of course, you would never take a golf cart on the highway, so that is where the Smart car comes in, with a 10gallon tank, could easily get you from one major city to another.
Like golf carts, the Smart-for-Two model being sold in the US will hold only two people, which is perfect for manyAmerican families. The standard 4-person or 5-person sedan is too big for most DINKS (Dual Income, No Kids), and other families with kids often opt for the 7-person SUV instead.
It is good to see that energy consumption is finally getting the attention it deserves. IBM recently announced some exciting offerings to help data centers manage their energy consumption:
- IBM Systems Director Active Energy Manager V3.1 [AEM]:
A new, key component of IBM's [Cool Blue portfolio] offering, AEM helps clients manage and even potentially lower energy costs. According to Gartner, insufficient power and excessive heat remain the greatest challenges in the data center. With AEM, IT managers can understand exact power/cooling costs, manage the efficiency of the current environment and reduce energy costs. AEM is the only energy management software tool that can provide clients with a single view of the actual power usage across multiple IBM platforms, including x86, blades, Power and storage systems, with plans to extend support to the mainframe.
- IBM Usage and Accounting Manager Virtualization Edition V7.1 [UAV]for System p and System x:
UAV gives IT managers more information to manage data center costs. These powerful usage management tools are designed to accurately measure, analyze, and report resource utilization of virtualized/consolidated/shared resources. With UAV, IT managers can better manage costs and justify new systems by determining who is using how much of which resource; assessing the cost of an IT service or application; and accurately charging each user or department. Working with AEM capabilities, it will also allow tracking of energy consumption costs by server and by user. This level of reporting eliminates a key inhibitor to the adoption of virtualization and consolidation and further differentiates IBM systems.
- IBM Tivoli Usage and Accounting Manager[UAM]:
This solution -- ideal for heterogenous IT shops -- serves as an accurate measurement tool underlying billing processes and SLA compliance. UAM provides usage-based accounting and charging for virtually any IT resources across the enterprise -- ranging from mainframes to virtualized servers to storage networks and more. The Usage and Accounting Manager Virtualization offerings seamlessly integrate into it.
Whether you are trying to reduce energy consumption in your data center, or in your transportation around town, these innovations can help you stay "green".
technorati tags: Smart Car, USAToday, golf cart, street legal, hybrid, MPG, green, energy, IBM Systems, Director, AEM, UAV, UAM, TUAM, SLA, management, virtualization, DINKS, SUV
A few weeks ago, my Tivo(R) digital video recorder (DVR) died. All of my digital clocks in my house were flashing 12:00 so I suspect it wasa power strike while I was at the office. The only other item to die was the surge protector,and so it did what it was supposed to do, give up its own life to protect the rest of myequipment. Although somehow, it did not protect my Tivo.
I opened a problem ticket with Sony, and they sent me instructions on how to send itover to another state to get it repaired.Amusingly, the instructions included "Please make a backup of the drive contents beforesending the unit in for repair." Excuse me? How am I supposed to do that, exactly?
My model has only a single 80GB drive, and so my friend and I removed the drive and attachedit to one of our other systems to see if anything was salvageable. It failed every diagnostictest. There was just not enough to read to be usable elsewhere.
This is typical of many home systems. They are not designed for robust usage, high availability, nor any form of backup/recovery process. Some of the newer models havetwo drives in a RAID-1 mode configuration, but most have many single points of failure.
And certainly, it is not mission critical data. Life goes on without the last few episodesof Jack Bauer on "24", or the various Food Network shows that I recorded for items I planto bake some day. For the past few weeks, I have spent more time listening to the radioand reading books. Somehow, even though my television runs fine without my Tivo, watchingTV in "real time" just isn't the same.
I suspect that if you gave someone a method to do the backup, most would not bother to useit. People are now relying more and more heavily on their home-basedinformation storage systems, digital music, video and cherished photographs. Perhaps experiencing a "loss" will help them appreciate backup/recovery systems so much more than they do today.
technorati tags: Tivo, Digital Video Recorder, DVR, RAID, backup, recovery, loss, information, storage, systems[Read More]
A recent blog by Chris Mellor makes the outlandish conspiracy theory that IBM and HDS copied virtualisation technology
from small start-up company DataCore
(Chris doesn't actually name who is his source making such a claim, whether thatsomeone was employed by any of the parties involved at the time the events occurred,or is currently employed by a competitor like EMC bitterly jealous of the success IBM and HDScurrently enjoy with their offerings.)
As I already posted before about IBM'slong history of storage virtualization, SAN Volume Controller was really part of a sequence of major product in this area, after the successful 3850 MSS and 3494 VTS block virtualization products.
In the late 1990's, our research teams in Almaden, California and Hursley, UK were exploring storagetechnologies that could take advantage of commodity hardware parts and the industry-leadingLinux operating system.
As is often the case, while IBM was working on "the perfect product", small start-ups announce "not-yet-perfect" products into the marketplace. Tactical moves like partneringwith DataCore was a smart move, for the following reasons:
- Helps identify market segments. Identify which subset of customers would most benefit fromdisk virtualization. While our 3850 MSS and 3494 VTS were focused on mainframe customers, this newtechnology was focused on distributed Unix, Windows and Linux servers.
- Helps prioritize market requirements. What are the most appealing features?What drives clients to buy disk virtualization for distributed systems platforms?
- Helps evaluate packaging options. Should we deliver pure software and expect customersto purchase their own servers? Should we offer this as a "service offering" with installation anddeployment services included? Should we offer this as hardware with software pre-installed?
The partnership proved worthwhile, not just to prove to IBM that this was a worthwhile market to enter, but also how "NOT" to package a solution. Specifically, DataCore SANsymphony was software that you had to install on your own Windows-based server. The client was left with the task of orderinga suitable Intel-based server, with the right amount of CPU cycles, RAM and host bus adapter ports,and configure the Windows operating system and DataCore software.
It didn't go well. Basically, customers were expected to be their own "hardware engineers", having to knowway too much about storage hardware and software to design a combination that worked for theirworkloads. Most clients were disappointed with the amount of effort involved, and the resulting poor performance.
To fix this, IBM delivered the SAN Volume Controller, with an optimized Linux operating system and internally-writtensoftware that runs on IBM System x(tm) server hardware optimized for performance.
I can't speak for HDS, but I suspect they came to similar conclusions that resulted in a similar decisionto build their product in-house. I welcome Hu Yoshida to correct me if I am wrong on this.
technorati tags: Chris Mellor, DataCore, SANsymphony, IBM, SVC, HDS, EMC, Invista, disk, storage, virtualization, Hu Yoshida, Windows, Linux[Read More]