In my post yesterday [Spreading out the Re-Replication process
], fellow blogger BarryB [aka The Storage Anarchist
]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
- If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
- JBOD: 4 drives
- JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
- RAID-10: 8 drives
- RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
- RAID-5: 5 drives
- This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
- RAID-6: 6 drives
- This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
- EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
- Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
- EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
- IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
- Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
Well, I hope that helps clear some things up.
technorati tags: IBM, XIV, Nextra, EMC, BarryB, RAID-0, RAID-1, RAID-5, RAID-6, RAID-10, RAID-X, AX4, Dell, AX4-5, FC, SAS, SATA, iSCSI, TCO, blob, object-level, disk, storage, system, Centera, ESCON, FICON, Symmetrix, CLARiiON, ATA, CAGR, Web2.0
On his The Storage Architect
blog, Chris Evans wrote [Twofor the Price of One
]. He asks: why use RAID-1 compared to say a 14+2 RAID-6 configuration which would be much cheaper in terms of the disk cost?
Perhpaps without realizing it, answers itwith his post today [XIV part II
So, as a drive fails, all drives could be copying to all drives in an attempt to ensure the recreated lost mirrors are well distributed across the subsystem. If this is true, all drives would become busy for read/writes for the rebuild time, rather than rebuild overhead being isolated to just one RAID group.
Let me try to explain. (Note: This is an oversimplification of the actual algorithm in an effortto make it more accessible to most readers, based on written materials I have been provided as partof the acquisition.)
In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one driveto write, and today's systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.
In the IBM XIV Nextra architecture, let's say we have 100 drives. We lose drive 13, and we need to re-replicate any at-risk 1MB objects.An object is at-risk if it is the last and only remaining copy on the system. A 750GB that is 90 percent full wouldhave 700,000 or so at-risk object re-replications to manage. These can be sorted by drive. Drive 1 might have about 7000 objects that need re-replication, drive 2might have slightly more, slightly less, and so on, up to drive 100. The re-replication of objects on these other 99 drives goes through three waves.
- Wave 1
Select 49 drives as "source volumes", and pair each randomly with a "destination volume". For example, drive 1 mapped todrive 87, drive 2 to drive 59, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
- Wave 2
50 volumes left.Select another 49 drives as "source volumes", and pair each with a "destination volume". For example, drive 87 mapped todrive 15, drive 59 to drive 42, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
- Wave 3
Only one drive left. We select the last volume as the source volume, pair it off with a random destination volume,and complete the process.
Each wave can take as little as 3-5 minutes. The actual algorithm is more complicated than this, as tasks complete early the source and volumes drives are available for re-assignment to another task, but you get the idea. XIV hasdemonstrated the entire process, identifying all at-risk objects, sorting them by drive location, randomly selectingdrive pairs, and then performing most of these tasks in parallel, can be done in 15-20 minutes. Over 40 customershave been using this architecture over the past 2 years, and by now all have probably experienced at least adrive failure to validate this methodology.
In the unlikely event that a second drive fails during this short time, only one of the 99 task fails. The other 98 tasks continue to helpprotect the data. By comparison, in a RAID-5 rebuild, no data is protected until all the blocks are copied.
As for requiring spare capacity on each drive to handle this case, the best disks in production environments aretypically only 85-90 percent full, leaving plenty of spare capacity to handle re-replication process. On average,Linux, UNIX and Windows systems tend to only fill disks 30 to 50 percent full, so the fear there is not enough sparecapacity should not be an issue.
The difference in cost between RAID-1 and RAID-5 becomes minimal as hardware gets cheaper and cheaper. For every $1 dollar you spend on storage hardware, you spend $5-$8 dollars managing the environment. As hardware gets cheaper still, it might even be worth making three copies of every 1MB object, the parallel processto perform re-replications would be the same. This could be done using policy-based management, some data gets triple-copied, and other data gets only double-copied, based on whether the user selected "premium" or "basic" service.
The beauty of this approach is that it works with 100 drives, 1000 drives, or even a million drives. Parallel processingis how supercomputers are able to perform feats of amazing mathematical computations so quickly, and how Web 2.0services like Google and Yahoo can perform web searches so quickly. Spreading the re-replication process acrossmany drives in parallel, rather than performing them serially onto a single drive, is just one of the many uniquefeatures of this new architecture.
technorati tags: Chris Evans, RAID-1, RAID-5, RAID-6, performance, bottleneck, FC, SATA, disk, system, IBM, XIV, Nextra, objects, re-replication, spare capacity
Wrapping up my week's theme on IBM's acquisition XIV, we have gotten hundreds of positive articles and reviews in the press, but has caused quite a stir with the[Not-Invented-Here
] folks at EMC.We've heard already from EMC bloggers [Chuck Hollis
] and [Mark Twomey
].The latest is fellow EMC blogger BarryB's missive [Obligatory "IBM buys XIV" Post
], which piles on the "Fear, Uncertainty and Doubt" [FUD
], including this excerpt here:
In a block storage device, only the host file system or database engine "knows" what's actually stored in there. So in the Nextra case that Tony has described, if even only 7,500-15,000 of the 750,000 total 1MB blobs stored on a single 750GB drive (that's "only" 1 to 2%) suddenly become inaccessible because the drive that held the backup copy also failed, the impact on a file system could be devastating. That 1MB might be in the middle of a 13MB photograph (rendering the entire photo unusable). Or it might contain dozens of little files, now vanished without a trace. Or worst yet, it could actually contain the file system metadata, which describes the names and locations of all the rest of the files in the file system. Each 1MB lost to a double drive failure could mean the loss of an enormous percentage of the files in a file system.
And in fact, with Nextra, the impact will be across not just one, but more likely several dozens or even hundreds of file systems.
Worse still, the Nextra can't do anything to help recover the lost files.
Nothing could be further from the truth. If any disk drive module failed, the system would know exactly whichone it was, what blobs (binary large objects) were on it, and where the replicated copies of those blobs are located. In the event of a rare double-drive failure, the system would know exactly which unfortunate blobs were lost, and couldidentify them by host LUN and block address numbers, so that appropriate repair actions could be taken from remote mirrored copies or tape file backups.
Second, nobody is suggesting we are going to put a delicateFAT32-like Circa-1980 file system that breaks with the loss of a single block and requires tools like "fsck" to piece back together. Today's modern file systems--including Windows NTFS, Linux ext3, and AIX JFS2--are journaled and have sophisticated algorithms tohandle the loss of individual structure inode blocks. IBM has its own General Parallel File System [GPFS] and corresponding Scale out File Services[SOFS], and thus brings a lotof expertise to the table.Advanced distributed clustered file systems, like [Google File System] and Yahoo's [Hadoop project] take this one step further, recognizing that individual node and drive failures at the Petabyte-scale are inevitable.
In other words, XIV Nextra architecture is designed to eliminate or reduce recovery actions after disk failures, not make them worse. Back in 2003, when IBM introduced the new and innovative SAN Volume Controller (SVC), EMCclaimed this in-band architecture would slow down applications and "brain-damage" their EMC Symmetrix hardware.Reality has proved the opposite, SVC can improve application performance and help reduce wear-and-tear on the manageddevices. Since then, EMC acquired Kashya to offer its own in-band architecture in a product called EMC RecoverPoint, that offers some of the features that SVC offers.
If you thought fear mongering like this was unique to the IT industry, consider that 105years ago, [Edison electrocuted an elephant]. To understand this horrific event, you have to understand what was going on at the time.Thomas Edison, inventor of the light bulb, wanted to power the entire city of New York with Direct Current(DC). Nikolas Tesla proposed a different, but more appropriate architecture,called Alternating Current(AC), that had lower losses over distances required for a city as large and spread out as New York. But Thomas Edison was heavily invested in DC technology, and would lose out on royalties if ACwas adopted.In an effort to show that AC was too dangerous to have in homes and businesses, Thomas Edison held a pressconference in front of 1500 witnesses, electrocuting an elephant named Topsy with 6600 volts, and filmed the event so that it could be shown later to other audiences (Edison invented the movie camera also).
Today's nationwide electric grid would not exist without Alternating Current.We enjoy both AC for what it is best used for, and DC for what it is best used for. Both are dangerous at high voltage levels if not handled properly. The same is the case for storage architectures. Traditional high-performance disk arrays, like the IBM System Storage DS8000, will continue to be used for large mainframe applications, online transaction processing and databases. New architectures,like IBM XIV Nextra, will be used for new Web 2.0 applications, where scalability, self-tuning, self-repair,and management simplicity are the key requirements.
(Update: Dear readers, this was meant as a metaphor only, relating the concerns expressed above thatthe use of new innovative technology may result in the loss or corruption of "several dozen or even hundreds of file systems" and thus too dangerous to use, with an analogy on the use of AC electricity was too dangerous to use in homes. To clarify, EMC did not re-enact Thomas Edison's event, no animalswere hurt by EMC, and I was not trying to make political commentary about the current controversy of electrocution as amethod of capital punishment. The opinions of individual bloggers do not necessarily reflect the official positions of EMC, and I am not implying that anyone at EMC enjoys torturing animals of any size, or their positions on capital punishment in general. This is not an attack on any of the above-mentioned EMC bloggers, but rather to point out faulty logic. Children should not put foil gum wrappers in electrical sockets. BarryB and I have apologized to each other over these posts for any feelings hurt, and discussion should focus instead on the technologies and architectures.)
While EMC might try to tell people today that nobody needs unique storage architectures for Web 2.0 applications, digital media and archive data, because their existing products support SATA disk and can be used instead for these workloads, they are probably working hard behind the scenes on their own "me, too" version.And with a bit of irony, Edison's film of the elephant is available on YouTube, one of the many Web 2.0 websites we are talking about. (Out of a sense of decency, I decided not to link to it here, so don't ask)
technorati tags: IBM, XIV, EMC, BarryB, FUD, Nextra, blob, Thomas Edison, Nikolas Tesla, Web2.0, scalability, Petabyte-scale, self-tuning, self-repair, DS8000, disk, systems, Topsy, elephant, light bulb, movie camera, invention, DC, AC, YouTube
Yesterday's announcement that IBM had acquired XIV to offer storage for Web 2.0 applicationsprompted a lot of discussion in both the media and the blogosphere. Several indicated thatit was about time that one of the major vendors stepped forward to provide this, and it madesense that IBM, the leader in storage hardware marketshare, would be the first. Others were perhaps confused on what is unique with Web 2.0 applications. What has changed?
I'll use this graphic to help explain how we have transitioned through three eras of storage.
- The first era: Server-centric
In the 1950s, IBM introduced both tape and disk systems into a very server-centric environment.Dumb terminals and dumb storage devices were managed entirely by the brains inside the server.These machines were designed for Online Transaction Processing (OLTP), everywhere from bookingflights on airlines to handling financial transfers.
- The second era: Network-centric
In the 1980s and 1990s, dumb terminals were replaced with smarter workstations and personalcomputers; and dumb storage were replaced with smarter storage controllers. Local Area Networks (LANs)and Storage Area Networks (SANs) allowed more cooperative processing between users, servers andstorage. However, servers maintained their role as gatekeepers. Users had to go through aspecific server or server cluster to access the storage they had access to. These servers continuedtheir role in OLTP, but also manage informational databases, file sharing and web serving.
- The third era: Information-centric
Today, we are entering a third era. Servers are no longer the gatekeepers. Smart workstationsand personal computers are now supplemented with even more intelligent handheld devices, Blackberryand iPhones, for example. Storage is more intelligent too, with some being able to offer file sharingand web serving directly, without the need of an intervening server. The roles of servers have changed,from gatekeepers, to ones that focuses on crunching the numbers, and making information presentableand useful.
Sam Palmisano, CEO and chairman of IBM, first introduced this in March 2006 as the [Globally Integrated Enterprise],but the concept applies to organizations of all sizes, from large multi-nationals to the local [Mom and Pop shops].
Here is where Web 2.0 applications, digital media and archives fits in. These are focused on unstructured data that don't require relational database management systems. So long as the useris authorized, subscribed and/or has made the appropriate payment, she can access the information. With the appropriate schemes in place, information can now be mashed-up in a variety of ways, combined with other information that can render insights and help drive new innovations.
Of course, we will still have databases and online transaction processing to book our flights andtransfer our funds, but this new era brings in new requirements for information storage, and newarchitectures that help optimize this new approach.
technorati tags: IBM, XIV, Web2.0, server-centric, network-centric, information-centric, OLTP, database, disk, tape, systems, dumb terminal, workstations, storage controller, LAN, SAN, digital media, archive, servers, handheld, devices, file sharing, web serving, insight, innovation
So here we are in January, named after the two-faced Roman god Janus, who in their mythology was the god of gates and doors, and beginnings and endings.
-- Roger von Oech[Our "Janus-Like" Powers]
Well, it's 2008, which could mark the end to RAID5 and mark the beginnings of a new disk storagearchitecture. IBM starts the year with exciting news, acquiring new disk technology from a smallstart-up called XIV, led by former-EMCer Moshe Yanai. Moshe was ousted publicly in 2001 from hisposition as EMC's VP of engineering, and formed his own company. It didn't take long for EMC bloggersto poke fun at this already. Mark Twomey, in his StorageZilla blog, had mentioned XIV before back in August,[XIV], and again todayin [IBM Buys XIV].
The following is an excerpt from the [IBM Press Release]:
To address the new requirements associated with next generation digital content, IBM chose XIV and its NEXTRA™ architecture for its ability to scale dynamically, heal itself in the event of failure, and self-tune for optimum performance, all while eliminating the significant management burden typically associated with rapid growth environments. The architecture also is designed to automatically optimize resource utilization of all the components within the system, which can allow for easier management and configuration and improved performance and data availability.
"We are pleased to become a significant part of the IBM family, allowing for our unique storage architecture, our engineers and our storage industry experience to be part of IBM's overall storage business," said Moshe Yanai, chairman, XIV. "We believe the level of technological innovation achieved by our development team is unparalleled in the storage industry. Combining our storage architectural advancements with IBM's world-wide research, sales, service, manufacturing, and distribution capabilities will provide us with the ability to have these technologies tackle the emerging Web 2.0 technology needs and reach every corner of the world."
The NEXTRA architecture has been in production for more than two years, with more than four petabytes of capacity being used by customers today.
Current disk arrays were designed for online transaction processing (OLTP) databases. The focus was onusing fastest most expensive 10K and 15K RPM Fibre Channel drives, with clever caching algorithmsfor quick small updates of large relational databases. However, the world is changing, and peoplenow are looking for storage designed for digital media, archives, and other Web 2.0 applications.
One problem that NEXTRA architecture addresses is RAID rebuild. In a standard RAID5 6+P+S configuration of 146GB 10K RPM drives, the loss of one disk drive module (DDM) was recovered by reconstructing the data from parity of the other drives onto the spare drive. The process took46 minutes or longer, depending on how busy the system was doing other things. During this time,if a second drive in the same rank fails, all 876GB of data are lost. Double-drive failures are rare,but unpleasant when they happen, and hopefully you have a backup on tape to recover the data from.Moving to slower, less expensive SATA drives made this situation worse. The drives have highercapacity, but run at slower speeds. When a SATA drive fails in a RAID5 array, it could take severalhours to rebuild, and that is more time exposure for a second drive failure. A rebuild for a 750GBSATA drive would take five hours or more,with 4.5 TB of data at risk during the process if a second drive failure occurs.
The Nextra architecture doesn't use traditional RAID ranks or spare DDMs. Instead, data is carved up into 1MBobjects, and each object is stored on two physically-separate drives. In the event of a DDM loss, allthe data is readable from the second copies that are spread across hundreds of drives. New copies aremade on the empty disk space of the remaining system. This process can be done for a lost 750GB drive in under20 minutes. A double-drive failure would only lose those few objects that were on both drives, so perhaps1 to 2 percent of the total data stored on that logical volume.
Losing 1 to 2 percent of data might be devastating to a large relational database, as this could impactthe entire access to the internal structure. However, this box was designed for unstructuredcontent, like medical images, music, videos, Web pages, and other discrete files. In the event of a double-drivefailure, individual files would be recovered, such as with IBM Tivoli Storage Manager backup software.
IBM will continue to offer high-speed disk arrays like the IBM System Storage DS8000 and DS4800 for OLTP applications, and offer NEXTRA for this new surge in digital content of unstructured data. Recognizing this trend, diskdrive module manufacturers will phase out 10K RPM drives, and focus on 15K RPM for OLTP, and low-speedSATA for everything else.
Update: This blog post was focused on the version of XIV box available as of January 2008 that was built by XIV prior to the IBM acquisition. IBM has since made a major revision, made available August 2008 thataddresses a variety of workloads, including database, OLTP, email, as well as digital content and unstructuredfiles. Contact your IBM or IBM Business Partner for the latest details!
Bottom line, IBM continues to celebrate the new year, while the EMC folks in Hopkington, MA will continue to nurse their hangovers. Now that's a good way to start the new year!
technorati tags: Janus, two-faced, Roman god, Roger Von Oech, IBM, RAID5, XIV, EMC, Moshe Yanai, Mark Twomey, StorageZilla, NEXTRA, double-drive failure, rebuild, HDD, DDM, HDD, digital content, unstructured data
Well, it's the last day of the year, and I will be celebrating the new year soon.In the mean time, I leave you with an interesting triple combo related to information.
- The Past
Nick Carr in his post [Cleaning the Slate] offers a list of articles he did not have time for in 2007.Of these, I enjoyed the 7-page keynote address[Information, Knowledge, Authority and Democracy] by Hunter R. Rawlings III.He talks about the importance of recorded knowledge, including discussions by the US founding fathers Thomas Jefferson and James Madison, and how information is an essential part of democracy.Here's a brief excerpt:
Following the burning of the Capitol in 1815,President James Madison restored the Library of Congress by purchasing ThomasJefferson’s library for the nation. It was Jefferson’s unique classification scheme that thefirst full-time Librarian of Congress, appointed by Madison, used in reorganizing theLibrary. The United States, embodied in the Congress, was to have the best library inthe world because knowledge was necessary to its fundamental purpose, the creationand protection of liberty.
James Madison believed, in other words, that he lived in a “knowledge age.” In ourmyopic way, we like to think that we invented the knowledge age sometime late in the20th century. We did not. Madison and his contemporaries had complete faith andconfidence in the necessity of what they called “useful knowledge,” which, of course,privileged many things we no longer consider useful, such as the ability to read Latinand Greek and to understand the lessons of ancient history.
- The Present
Tim Ferriss in his post [12 Filtering Tips for Better Information] discusses[Ryan Holiday] and his ["collaborative filtering"] suggestions on howto deal with the tidal wave of information that arrives at you every day. Thisincludes the use of an RSS feed reader, Stumble Upon, and del.icio.us websites. Here's an excerpt:
...by employing collaborative filtering, you use other people’s time to weed out the things that would waste yours. In fact, Del.icio.us and Stumble Upon polls your friends and people with similar interests for the most crucial sources of information and anything else you might have accident skipped over. If The Wisdom of Crowds has taught us anything, it is that a large group of people is drastically more efficient than you’ll ever be on your own.
Unless you enjoy grinding yourself to the bone, use this principle—whether you call it “crowdsourcing” or otherwise—to stop drinking from the information fire hose. It’s not more information, it’s better information, that distinguishes the real winners in business and life.
- The Future
Finally, Galacticast presents [A Copyright Carol],a humorous 5-minute parody video on what might happen in the future as a result of lawslike the Canadian Digital Millennium Copyright Act[DMCA].
Well, that's it for 2007, see you all next year!
technorati tags: Nick Carr, Information, Knowledge, Authority, Democracy, Hunter Rawlings, Thomas Jefferson, James Madison, Library of Congress, Tim Ferriss, crowdsourcing, Stumbled Upon, Del.icio.us, collaborative filtering, Wisdom of Crowds, A Copyright Carol, Canadian, DMCA,
Yesterday, I was able to get the "Build 650" up and running under Qemu emulation onmy Thinkpad laptop computer. Today, I was able to get my Thinkpad and my XO laptoptalking to each other for a "chat".
The built-in "Chat" activity is one of the many kid-friendly activities included onthe XO laptop for the One Laptop Per Child [OLPC] project.It is also possible for two or more people to share other activities, like editing a textdocument, or browsing the internet.
As they say, emulation is only 95% complete, and this is true in this case as well. My Thinkpaddoes not have a built-in video camera, and for some reason the Qemu emulation does not let mehear any sound, despite specifying "-soundhw es1370" parameter. And lastly, it doesn't have the"mesh network" built-in Wi-Fi capability, just standard 54Mbps 802.1g through my Linksys router.
So, I set both XO and Thinkpad to use the new "xochat.org" jabber server so that the two couldsee each other:
$ sugar_control_panel -s jabber xochat.org
I set my XO nickname to be "TonyP" and my Thinkpad to be "Pearson", and chose blue-orange forthe first, and orange-blue for the second.
The process of starting a chat is similar to other IM systems like IBM Lotus Sametime. You havea neighborhood view that shows all people online using the same jabber server. In my case therewere about 30 or so icons on the screen. From the colors on my XO, I was able to locate my Thinkpad,and invite him to a chat. You can share the chat with everyone on the network, or keep it privatebetween two people. I tried both ways to see the difference.
In a private two-way chat, the first person starts up their Chat activity, and sends an inviteto join to another person. The second person sees a flashing chat bubble on the bottom of thescreen to the left of all the other action bar icons. The difference is that the chat bubble isblue-orange matching the sender, rather than black-and-white of the rest of the icons.
If the recipient happens to be busy doing something else full-screen, like browsing the web, theredoesn't seem to be any interruption. It is only when he goes to "home view" will he see the coloredchat bubble and decide to join or not.
The chat itself colorizes the text to match to color of the participant's icons. Blue for one, and orangefor the other. It two people had identical color schemes I guess it might be hard to tell. Thetext is white, so it is best to choose darker colors for contrast.
A nice feature is that you can save your chat session with the "keep" button on the upper rightpart of the screen, and your dialogue discussion will show up as an entry in the "journal".
Using this technique, it is possible for someone who has one "XO" laptop and one regular computer,or two regular computers, to develop and test applications that involve the sharing aspect of educational opportunities. Chats can be between students, student-to-teacher, or event student-to-mentor.
technorati tags: OLPC, XO, laptop, Qemu, Chat, xochat.org, develop, test, activities
Continuing my week's theme on the XO laptop from the One Laptop Per Child [OLPC
] foundation, I successfully managedto emulate my XO on another system.
Part of what is attractive of the XO laptop is the hardware, the high-resolution200dpi screen, the clever screen that rotates and folds flat into an eBook reader,and the water-tight, dust-proof keyboard. The other part is the software, howthey managed to pack an entire operating system, with useful applications, intoa 1GB NAND flash drive.
The drawback for developers like me is the risk of changing something that breaks the system. For example, my first attempt to create my own activityresulted in a blank space in my action bar, and my journal went into someinfinite loop, blinking as if it were still loading for minutes on end. I fixed it by deleting out the activity I created and rebooting.
To get around this, I successfully ran the disk-image under Linux's Virtual Machinesoftware called Qemu. This is an open source offering, with a proprietary add-onaccelerator called Kqemu. Here were the steps involved:
- Base Operating System
Qemu is now available to run on Linux, Windows and OS X-Intel. I have an Ubuntu 7.04"Feisty Faun" version of Linux installed on my system from a project I did last year, so decided to use that.
Normally, "apt-get install qemu" would be enough, but I wanted to get the latest release, so I downloaded the [0.9.0 version]tarball of compiled binaries. Note that trying to compile Qemu from source requiresa downlevel gcc-3.x compiler, and my attempts to do this failed. The compiled binariesworked fine.
The Kqemu author hasn't packaged this for distribution, so I download the source code anddid my own compiles. You can do the "configure-make-install" using the regular gcc 4.1compiler and it went smoothly.
Getting Kqemu active was bit of a challenge. I had to make sense of Nando Florestan's[Installing Kqumu in Ubuntu] article,and the subsequent comments that followed.
There is a tiny [8MB Linux image]that should be used to verify the Kqemu is activated correctly.
- The Disk Image
As with other development efforts, there are the older stable versions, and the bleedingedge development versions. I chose the 650 Build from the [Ship.2 stable versions], whichmatches the version on my XO laptop. The image comes as a *.bz2, which is a highly-compressedfile. Using "Bunzip2", the 221MB file expands to something like 932MB.
I renamed the resulting file to "build650.img"
Once I got all this done, I then made a simple script "launch" in my /home/tpearson/bin directory:
#!/bin/shqemu -m 256 -full-screen -kernel-kqemu -soundhw es1370 -net nic,model=rtl8139 -net user -hda $1
Then "launch build650.img" was all I needed to run the emulation. The full-screen mode helpsemulate the view on XO laptop. I was able to change the jabber server to "xochat.org" and see otherXO laptops online on my neighborhood view.
When running under Qemu, you can't just press Ctrl-Alt-something. For example, Ctrl-Alt-Erase onthe XO reboots the Sugar interface. However, do this on a Linux system, and it reboots your nativeX interface, blowing away everything.Instead, you press Ctrl-Alt-2 to get to the Qemu console, designated by (qemu) prompt,and then type:
Press "Ctrl-Alt-1" followed by "Ctrl-Alt" to get back to the emulated XO screen.
With this emulation, I am more likely to try new things, change files around, edit system files,and so on, without worrying about rendering my actual XO laptop unusable. Once debugged, I canthen work on moving them over to my XO, one at a time.
technorati tags: OLPC, XO, laptop, Qemu, Kqemu, Ubuntu, Linux, Activity, Journal
Wrapping up this week's theme on the XO laptop, I decided to take on thechallenge of printing. I managed to print from my XO laptop to my laserjet printer.I checked the One Laptop Per Child [OLPC
] website,and found there is no built-in support for printers, but there have been several peopleasking how to print from the XO, so here are the steps I did to make it happen.
(Note: I did all of these steps successfully on my Qemu-emulated system first, and then performed them on my XO laptop)
- Step 1: Determine if you have an acceptable printer
The XO laptop can only connect to a printer via USB cable or over the network.Check your printer to see if it supports either of these two options. In my case, my printer is connected to my Linksys hub that offers Wi-Fi in my home.
The XO runs a modified version of Red Hat's Fedora 7, so we need to also determineif the printer is supported on Linux.Check the [Open Printing Database]for the level of support. This database has come up with the following ranking system.Printers are categorized according to how well they work under Linux and Unix. The ratings do not pertain to whether or not the printer will be auto-recognized or auto-configured, but merely to the highest level of functionality achieved.
- Perfectly - everything the printer can do is working also under Linux
- Mostly - work almost perfectly - funny enhanced resolution modes may be missing, or the color is a bit off, but nothing that would make the printouts not useful
- Partially - mostly don't work; you may be able to print only in black and white on a color printer, or the printouts look horrible
- Paperweight - These printers don't work at all. They may work in the future, but don't count on it
If your printer only supports a parallel cable connection, or does not have a high enough ranking above, go buy another printer. The [Linux Foundation] websiteoffers a list of suggested printers and tutorials.
In my case, I have a Brother HL5250-DN black-and-white laserjet printer connected over a network to Windows XP, OS X and my other Linux systems. It is rated as supporting Linux perfectly, so I decided to use this for my XO laptop.
- Step 2: Install Common UNIX Printing System (CUPS)
Technically, Linux is not UNIX, but for our purposes, close enough. Start the Terminalactivity, use "su" to change to root, and then use "yum" to install CUPS. Yum will automatically determine what other packages are needed, in this case paps and tmpwatch. Once installed, use "/usr/sbin/cupsd" to get the CUPS daemon started, and add this to the end ofrc.local so that it gets started every time you reboot.
Click graphic on the left to see larger view
[olpc@xo-10-CC-6F ~]$ subash-3.2# yum install cups...Total download size = 3.0 MIs this OK [y/N]? y
bash-3.2# /usr/sbin/cupsdbash-3.2# echo "/usr/sbin/cupsd" >> /etc/rc.d/rc.localbash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 3: Install Opera or Firefox browser
To download the appropriate drivers, you may need a browser that can handle file downloads. I have triedto do this with the built-in Browse activity (aka Gecko) but encountered problems. I have both Opera and Firefox installed, but I will focus on Opera for this effort.I also installed the older184.108.40.206 version of the Flash player (worked better than the latest 220.127.116.11 version) and Java JRE.Follow the OLPC Wiki instructions for [Opera, Adobe Flash,and Sun Java] installation, thenverify with the following [Java and Flash] testers.
- Step 4: Download drivers and packages unique for your printer
In my case, I used Opera to get to the [Brother Linux Driver Homepage], and downloaded the RPM's for LPR and CUPS wrapper. These are the ones listed under "Drivers for Red Hat, Mandrake (Mandriva), SuSE". I saved these under "/home/olpc" directory.
[olpc@xo-10-CC-6F ~]$ subash-3.2# cd /home/olpcbash-3.2# rpm -vi brhl5250dnlpr-2.0.1-1.i386.rpmbash-3.2# rpm -vi cupswrapperHL5250DN-2.0.1-1.i386.rpmbash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 5: Create a "root" password
By default, the root user has no password. However, you will need it to be something for later steps,so here is the process to create a root password. I set mine to "tony" which normallywould be considered too simple a password, but ignore those messages and continue.We will remove it in step 8 (below) to put things back to normal.
[olpc@xo-10-CC-6F ~]$ subash-3.2# passwdChanging password for user root.New UNIX password: tonyBAD PASSWORD: it is too shortRetype new UNIX password: tonypasswd: all authentication tokens updated successfullybash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 6: Launch CUPS administration
Here I followed the instructions in Robert Spotswood's [Printing In Linux with CUPS] tutorial.Launch the Opera browser, and enter "http://localhost:631/admin" as the URL. The localhostrefers to the laptop itself, and 631 is the special port that CUPS listens to from browsers. You can alsouse 127.0.0.1 as a shortcut for "localhost", and can be used interchangeably.
In my case, it detected both of my networked printers, so I selected the HL5250DN, entered thelocation of my PPD file "/usr/share/cups/model/HL5250DN.ppd" that was created in Step 4. I set the URI to "lpd://192.168.0.75/binary_p1" per the instructions [Network Setting in CUPS based Linux system] in the Brother FAQ page. I chage the page size from "A4" to "Letter".I set this printer as the default printer. When it asks for userid and password, that is whereyou would enter "root" for the user, and "tony" or whatever you decided to set your root password to.
Select "Print a Test Page" to verify that everything is working.
- Step 7: Printing actual files
Sadly, I don't know Opera well enough to know how to print from there. So, I went over to my trustedFirefox browser. Select File->Page Setup to specify the settings, File->Print Preview tosee what it will look like, and then File->Print to send it to the printer.
To print the file "out.txt" that is in your /home/olpc directory, for example, enter"file:///home/olpc/out.txt" as the URL of the firefox browser. This will show the file,which you can then print to your printer. I had to specify 200% scaling otherwise the fontswere too small to read.
- Step 8: Remove the "root" password
If you want to remove the root password, here are the steps.
[olpc@xo-10-CC-6F ~]$ suPassword: tonybash-3.2# passwd -d rootRemoving password for user root.passwd: Successbash-3.2# exit[olpc@xo-10-CC-6F ~]$
Now the problem is that there is no way to print stuff from any of the Sugar activities. The best place toput in print support would be the Journal
activity. Along the bottom where the mounted USB keys arelocated could be an icon for a printer, and dragging a file down to the printer ojbect could cause it tobe send to the printer.
The alternative is to write some scripts invocable from the Terminal activity to determine what isin the journal, and send them to LPR with the appropriate parameters.
I did not have time to do either of these, but perhaps someone out there can take on that as a project.
technorati tags: OLPC, XO, printing, printer, linux, Opera, Firefox, Java, Flash
Continuing my week's theme on the XO laptop from the One Laptop Per Child [OLPC
] project, I have been amused watching the OLPC forum discussion on the choiceof browser options available.
- Built-in Browser
The built-in browser is simple but functional. It is full screen,with back, forward, and bookmark buttons, and an entry field forthe URL. This browser is fully integrated with the Sugar platform,files downloaded will appear in the journal. Download an Activity*.xo file, for example, and you can install it from the Journal.If you want to upload a file, click BROWSE on the website, and theJournal will pop up to choose files from.
Out of the box, the XO supports a minimal Flash that can handlesome Flash-based games but not YouTube videos, and does not supportJava.
The good folks of Opera have built a special edition for the XO laptop.However, some settings need to be changed to make the fonts large enoughto read.
Opera can be run as a Sugar activity, but this just launches a mothertask, which in turn launches a daughter task that actually runs thebrowser. This means that Home View will have two icons. The mothertask has an the Opera icon, but click on it and you get a grey screen.The daughter task appears as a grey circle, click on it and you get thebrowser screen. Alt-Tab will rotate through the Activities, so thegrey screen of the mother task is part of the rotation.
Although Opera has one foot on the Sugar platform, and one foot off,the lack of integration means poor interaction with the journal. The use of Opera is correctly registered. However, downloadingfiles requires a working knowledge of subdirectories, and uploading anythingrequires knowing what it is called, and where it is located. Not obviousfor many of the items created by Sugar applications.
The XO laptop is based on Redhat Fedora distribution, so I downloadedthe Firefox RPM package and installed this. To run, you need to startthe Terminal Activity, and then at the cursor type firefox.Journal only registers that the Terminal activity was used, but not anythingelse.
Since I run Firefox 2.0 on Windows XP, OS X and Linux, I am very familiarwith this browser, and it works as expected. Like Opera, there are shortcut keys, tabs for multiple pages, and optionsto add Java and Flash player. I was able to install add-onsfor Del.icio.us and FireFTP, and they worked as expected. Having accessto FTP sites will make development on the XO much easier.Again all files are uploaded/downloaded to directories, so some workingknowledge of where files are placed is required.
The fonts in Firefox did not expand/shrink as nicely as they had in Opera.Be careful not to select "View->
To close, you have to select File->Quit from the browser window, whichbrings you back to the Terminal activity, which you can then shutdown with Ctrl-Esc.
For now, I will keep all three and continue to evaluate them.I saw a few opportunities for improvement:
- The Opera and Terminal icons are not on the first screen.You have to hit the right arrow to get to the "overflow" set of icons. Re-ordering the icons is simply a matter of editing the following file with "vi"(my first few lines I use are shown below):
Put the activities in the order you want. Any activity not listed willappear after these.
- It might be possible to create a modified Terminal activity thatinvoked Firefox directly, to eliminate having to type it in each time.
- Several people have expressed interest in a browser that runs entirely withthe Xo laptop folded over in eBook/Game mode, such that thekeyboard is completely covered up, exposing only the up-left-right-down arrowsand the Circle/Square/X/Check buttons.
- Change the "News Reader" to invoke Bloglines instead. This might be yetanother modified Terminal activity, but borrow the icon from News.
Well, if you have further thoughts on these browsers, enter a comment below.