Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Craig Rhinehart offers
[10 Things You Need to Know About the Technology Behind Watson].
An artist has come up with this clever
Dr. Jon Lenchner from IBM Research has a series of posts on
[How Watson "sees", "hears", and "speaks"] and [Selected Nuances].
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
On ZDnet, Steven J. Vaughan-Nichols welcomes our new [Linux Penguin Jeopardy overlords]. I have to say I share his enthusiasm!
technorati tags: IBM, Nova, Watson, #ibmwatson, Jeopardy, POWER7, p750, supercomputer, TeraFlops, disk, SONAS, GPFS, SAS, Craig Rhinehart, Jon Lenchner, Hans Morvec, Carnegie Mellon University, CMU
This week I am in Moscow, Russia for today's "Edge Comes to You" event. Although we had over 20 countries represented at the Edge2012 conference in Orlando, Florida earlier this month, IBM realizes that not everyone can travel to the United States. So, IBM has created the "Edge Comes to You" events where a condensed subset of the agenda is presented. Over the next four months, these events are planned in about two dozen other countries.
This is my first time in Russia, and the weather was very nice. With over 11 million people, Moscow is the 6th largest city in the world, and boasts having the largest community of billionaires. With this trip, I have now been to all five of the so-called BRICK countries (Brazil, Russia, India, China and Korea) in the past five years!
The venue was the [Info Space Transtvo Conference Center] not far from the Kremlin. While Barack Obama was making friends with Vladimir Putin this week at the G2012 Summit in Mexico, I was making friends with the lovely ladies at the check-in counter.
If it looks like some of the letters are backwards, that is not an illusion. The Russian language uses the [Cyrillic alphabet]. The backwards N ("И"), backwards R ("Я"), the number 3 ("З), and what looks like the big blue staple logo from Netapp ("П"), are actually all characters in this alphabet.
Having spent eight years in a fraternity during college, I found these not much different from the Greek alphabet. Once you learn how to pronounce each of the 33 characters, you can get by quite nicely in Moscow. I successfully navigated my way through Moscow's famous subway system, and ordered food on restaurant menus.
The conference coordinators were Tatiana Eltekova (left) and Natalia Grebenshchikova (right). Business is booming in Russia, and IBM just opened ten new branch offices throughout the country this month. So these two ladies in the marketing department have been quite busy lately.
I especially liked all the attention to detail. For example, the signage was crisp and clean, and the graphics all matched the Powerpoint charts of each presentation.
Moscow is close to the North pole, similar in latitude as Juneau, Alaska; Edinburgh, Scottland; Copenhagen, Denmark; and Stockholm, Sweden.
As a result, it is daylight for nearly 18 hours a day. The first part of the day, from 8:00am to 4:30pm, was "Technical Edge", a condensed version of the 4.5 day event in Orlando, Florida. I gave three of the five keynote presentations:
- Game Change on a Smarter Planet: A New Era in IT, discussing Smarter Computing and Expert-Integrated systems, based on what Rod Adkins presented in Orlando.
- A New Approach to Storage, explaining IBM Smarter Storage for Smarter Computing, IBM's new approach to the way storage is designed and deployed for our clients
- IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy! explaining how IBM Watson technologies are being used in Healthcare and Financial Services, based on what I presented in Orlando.
(Note: I do not speak Russian fluently enough to give a technical presentation, so I did then entire presentation in English, and had real-time translators convert to Russian for me. The audience wore headphones. However, I was able to sprinkly a few Russian phrases, such as "доброе утро", "Я не понимаю по-русский" and "спасибо".)
After the keynote sessions, I was interviewed by a journalist for [Storage News] magazine. The questions covered a variety of topics, from the implications of [Big Data analytics] to the future of storage devices that employ [Phase Change Memory]. I look forward to reading the article when it gets published!
The afternoon had break-out sessions in three separate rooms. Each room hosted seven topics, giving the attendees plenty to choose from for each time slot. I presented one of these break-out sessions, Big Data Cloud Storage Technology Comparison. The title was already printed in all the agendas, so we went with it, but I would have rather called it "Big Data Storage Options". In this session, I explained Hadoop, InfoSphere BigInsights, internal and external storage options.
I spent some time comparing Hadoop File System (HDFS) with IBM's own General Parallel File System (GPFS) which now offers Hadoop interfaces in a Shared-Nothing Cluster (SNC) configuration. IBM GPFS is about twice as fast as HDFS for typical workloads.
At the end of the Technical Edge event, there was a prize draw. Business cards were drawn at random, and three lucky attendees won a complete four-volume set of my book series "Inside System Storage"! Sadly, these got held up in customs, so we provided a "certificate" to redeem them for the books when they arrive to the IBM office.
The second part of the day, from 5:00pm to 8pm, was "Executive Edge", a condensed version of the 2 day event in Orlando, designed for CIOs and IT leaders. Having this event in the evening allowed busy executives to come over after they spend the day in the office. I presented IBM Storage Strategy in the Smarter Computing Era, similar to my presentation in Orlando.
Both events were well-attended. Despite fighting jet lag across 11 time zones, I managed to hang in there for the entire day. I got great feedback and comments from the attendees. I look forward to hearing how the other "Edge Comes to You" events fare in the other countries. I would like to thank Tatiana and Natalia for their excellent work organizing and running this event!
technorati tags: IBM, Moscow, Russia, Edge, ECTY, Cyrillic, Tatiana Eltekova, Natalia Grebenshchikova, Smarter Storage, Smarter Computing, Smarter Planet, Big Data, Cloud, IBM Watson, Jeopardy, Hadoop, HDFS, InfoSphere, BigInsights, GPFS, GPFS-SNC
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
- Private Cloud Computing at Bank of America – One Year Later
Prentice Dees, Senior VP for Systems Automation Engineering at Bank of America, did the happy dance celebrating their success implementing a private cloud. Bank of America merged with Merrill Lynch, has 29 million users residing in over 100 countries, and 5900 retail offices in 40 countries. They manage $1 billion US dollars in deposits, and $2.2 trillion in assets.
Rather than IaaS or PaaS, his team focused on Application-as-a-Service (AaaS). Their goal is to transform and move IT out of the way of the business. In his view, if a human has to touch a keyboard, then his team has failed.
He divides the work up into three layers:
- Bones: These are the physical components, such as servers, storage, switches that provide capacity and interconnect.
- Muscle: This is the translation layer, providing actions and reporting.
- Brains: This is the layer for intelligent automation
Provisioning new servers with storage involves three sets of steps. The first set of steps involves requesting approval. The second set of steps deploys the server. The third involves installing the application, loading the data and using it until End-of-Life. The second set of steps took 14 to 60 days before, and has been automated down to one to three hours.
The results is that he has improved server utilization 10x, and storage is over-provisioned 4x, and are now hosting over 11,000 server images, saving $20 million US dollars. Not only is this lower cost per application deployed, but the process allows for lower-skilled personnel. He has over 500TB of virtual storage deployed, using thin provisioning, with only 128TB of physical disk. But they have only scratched the surface. Only 15 to 20 percent are virtualized in this manner, and they want to get to 80 percent within the next three years.
What makes an application not "Cloud-ready"? Prentice is a big fan of Linux and Open Source solutions. Some applications consume the entire server. In other cases, code changes are required. If possible, try to split up large applications into smaller Cloud-ready chunks?
How many people on his team? There are currently 16 to 20 people on the team, but at its peak there were 30 people.
Rather than wasting time on capacity planning, his team focuses on a cost recovery model instead. Seed capital in combination with rock-solid recovery is the way to go. "All models are wrong," the saying goes, "but some are useful!"
A nice side benefit to this new approach is maintenance is greatly improved. Rather than rushing to fix problems, you roll the application over to another host machine, and then take your time fixing the failed hardware.
How does the team deal with requests for dedicated resources? Give them the keys to their own miniature private cloud. Let them provision from their dedicated resources using the same methods you use to provision everyone else. This allows them to get comfortable with the process, and eventually join the rest of the shared pool. Analytics can be used to find "rogue VMs" that don't play well with others.
Their automation is a mix of commercial and open source software, with home-grown scripts. They have one "Orchestration Management Data Base" (OMDB) to manage multiple disparate Configuration Management Data bases (CMDBs). The chargeback is not quite per individual pay-per-use, but more at the departmental level.
- Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
- A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
- 45 percent reduction in energy costs for Ethernet switching gear
- 80 percent fewer cables
- 15 percent lower costs
- doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward.
technorati tags: IBM, BofA, Prentice+Dees, AaaS, Linux, Open Source, OMDB, CMDB, Aging data, Archive, Retention, , InfoSphere, Optim, LotusLive, University Minnesota, , 10GbE, SR-IOV, GPFS, private cloud
Well, it's Tuesday, and you know what that means... IBM announcements!
In today's environment, clients expect more from their storage, and from their storage provider. The announcements span the gamut, from helping to use Business Analytics to analyze Big Data for trends, insights and patterns, to managing private, public and hybrid cloud environments, all with systems that are optimized for their particular workloads.
There are over a dozen different announcements, so I will split these up into separate posts. Here is part 1.
- IBM Scale Out Network Attach Storage (SONAS) R1.3
I have covered [IBM SONAS] for quite some time now. Based on IBM's General Parallel File System (GPFS), this integrated system combines servers, storage and software into a fully functional scale-out NAS solution that support NFS, CIFS, FTP/SFTP, HTTP/HTTPS, and SCP protocols. IBM continues its technical leadership in the scale-out NAS marketplace with new hardware and software features.
The hardware adds new disk options, with 900GB SAS 15K RPM drives, and 3TB NL-SAS 7200 RPM drives. These come in 4U drawers of 60 drives each, six ranks of ten drives each. So, with the high-performance SAS drives that would be about 43TB usable capacity per drawer, and with the high-capacity NL-SAS drives about 144TB usable. You can have any mix of high-performance drawers and high-capacity drawers, up to 7200 drives, for a maximum usable capacity of 17PB usable (21PB for those who prefer it raw). This makes it the largest commercial scale-out NAS in the industry. This capacity can be made into one big file system, or divided up to 256 smaller file systems.
In addition to snapshots of each file system, you can divide the file system up into smaller tree branches and snapshot these independently as well. The tree branches are called fileset containers. Furthermore, you can now make writeable clones of individual files, which provides a space-efficient way to create copies for testing, training or whatever.
Performance is improved in many areas. The interface nodes now can support a second dual-port 10GbE, and replication performance is improved by 10x.
SONAS supports access-based enumeration, which means that if there are 100 different subdirectories, but you only have authority to access five of them, then that's all you see, those five directories. You don't even know the other 95 directories exist.
I saved the coolest feature for last, it is called Active Cloud Engine™ that offers both local and global file management. Locally, Active Cloud Engine placement rules to decide what type of disk a new file should be placed on. Management rules that will move the files from one disk type to another, or even migrates the data to tape or other externally-managed storage! A high-speed scan engine can rip through 10 million files per node, to identify files that need to be moved, backed up or expired.
Globally, Active Cloud Engine makes the global namespace truly global, allowing the file system to span multiple geographic locations. Built-in intelligence moves individual files to where they are closest to the users that use them most. This includes an intelligent push-over-WAN write cache, on-demand pull-from-WAN cache for reads, and will even pre-fetch subsets of files.
No other scale-out NAS solution from any other storage vendor offers this amazing and awesome capability!
- IBM® Storwize® V7000
Last year, we introduced the [IBM Storwize V7000], a midrange disk system with block-level access via FCP and iSCSI protocols. The 2U-high control enclosure held two cannister nodes, a 12-drive or 24-drive bay, and a pair of power-supply/battery UPS modules. The controller could attach up to nine expansion enclosures for more capacity, as well as virtualize other storage systems. This has been one of our most successful products ever, selling over 100PB in the past 12 months to over 2,500 delighted customers.
The 12-drive enclosure now supports both 2TB and 3TB NL-SAS drives. The 24-drive enclosures support 200/300/400GB Solid-State Drives (SSD), 146 and 300GB 15K RPM drives, 300/450/600GB 10K RPM drives, and a new 1TB NL-SAS drive option. For those who want to set up "Flash-and-Stash" in a single 2U drawer, now you can combine SSD and NL-SAS in the 24-drive enclosure! This is the perfect platform for IBM's Easy Tier sub-LUN automated tiering. IBM's Easy Tier is substantially more powerful and easier to use than EMC's FAST-VP or HDS's Dynamic Tiering.
Last week, at Oracle OpenWorld, there were various vendors hawking their DRAM/SSD-only disk systems, including my friends at Texas Memory Systems, Pure Storage, and Violin Memory Systems. When people came to the IBM booth to ask what IBM offers, I explained that both the IBM DS8000 and the Storwize V7000 can be outfitted in this manner. With the Storwize V7000, you can buy as much or little SSD as you like. You do not have to buy these drives in groups of 8 or 16 at a time.
The Storwize V7000 is the sister product of the IBM SAN Volume Controller, so you can replicate between one and the other. I see two use cases for this. First, you might have a SVC at a primary location, and decide to replicate just the subset of mission-critical production data to a remote location, and use the Storwize V7000 as the target device. Secondly, you could have three remote or branch offices (ROBO) that replicate to a centralized data center SAN Volume Controller.
Lastly, like the SVC, the Storwize V7000 now supports clustering so that you can now combine multiple control enclosures together to make a single system.
- IBM® Storwize® V7000 Unified
Do you remember how IBM combined the best of SAN Volume Controller, XIV and DS8000 RAID into the Storwize V7000? Well, IBM did it again, combining the best of the Storwize V7000 with the common NAS software base developed for SONAS into the new "Storwize V7000 Unified".
You can upgrade your block-only Storwize V7000 into a file-and-block "Storwize V7000 Unified" storage system. This is a 6U-high system, consisting of a pair of 2U-high file modules connected to a standard 2U-high control enclosure. Like the block-only version, the control enclosure can attach up to nine expansion enclosures, as well as all the same support to virtualize external disk systems. The file modules combine the management node, interface node and storage node functionality that SONAS R1.3 offers.
What exactly does that mean for you? In addition to FCP and iSCSI for block-level LUNs, you can carve out file systems that support NFS, CIFS, FTP/SFTP, HTTP/HTTPS, and SCP protocols. All the same support as SONAS for anti-virus checking, access-based enumeration, integrated TSM backup and HSM functionality to migrate data to tape, NDMP backup support for other backup software, and Active Cloud Engine's local file management are all included!
- IBM SAN Volume Controller V6.3
The SAN Volume Controller [SVC] increases its stretched cluster to distances up to 300km. This is 3x further than EMC's VPLEX offering. This allows identical copies of data to be kept identical in both locations, and allows for Live Partition Mobility or VMware vMotion to move workloads seamlessly from one data center to another. Combining two data centers with an SVC stretch cluster is often referred to as "Data Center Federation".
The SVC also introduces a low-bandwidth option for Global Mirror. We actually borrowed this concept from our XIV disk system. Normally, SVC's Global Mirror will consume all the bandwidth it can to keep the destination copy of the data within a few seconds of currency behind the source copy. But do you always need to be that current? Can you afford the bandwidth requirements needed to keep up with that? If you answered "No!" to either of these, then the low-bandwidth option is you. Basically, a FlashCopy is done on the source copy, this copy is then sent over to the destination, and a FlashCopy is made of that. The process is then repeated on a scheduled basis, like every four hours. This greatly reduces the amount of bandwidth required, and for many workloads, having currency in hours, rather than seconds, is good enough.
I am very excited about all these announcements! It is a good time to be working for IBM, and look forward to sharing these exciting enhancements with clients at the Tucson EBC.
technorati tags: IBM, SONAS, GPFS, SAS, NL-SAS, Active Cloud Engine, Global+Namespace, Storwize+V7000, V7000U, V7000 Unified, block-only, block-and-file, SVC, SSD, Easy Tier, Flash-and-Stash, Texas Memory Systems, Pure Storage, Violin Memory
This post concludes my series of posts on Oracle OpenWorld 2011 conference. Here are some pictures from Wednesday and Thursday.
- IBM as the yardstick by which everyone measures against
Our friends at Violin Memory systems mentioned our joint-venture success results with IBM GPFS, scanning 10 billion files in less than an hour. (Their booth must have been slow, because members of their team spent a lot of time in our IBM booth!)
In fact, it seemed every company compared themselves to IBM in one fashion or another. Larry said that "IBM is a great company" and mentioned the IBM systems several times in comparisons to Oracle's newly announced hardware offerings.
- Larry's Sailing Vessel
When things slowed down, I took a walk to see the other parts of the exhibition area. In the Moscone West building was Larry's catamaran that won [last year's America's Cup].
I used to sail myself, and have been part of crews in sailing races in both Japan and Australia. A few years ago, I watched the America's Cup time trials in New Zealand.
- On the Streets of San Francisco
On the streets, IBM had advertised some of its products in a manner that thousands of attendees would see every day. Here we have some factoids related to IBM Netezza and DB2 database on POWER servers. We were very careful not to mention either product in the IBM Booth itself, as we all understand that IBM is a guest in Oracle's house this week. We certainly don't want to do anything to upset Larry in any way to make him treat IBM like he treated HP last year, or Salesforce.com this year.
- Rest in Peace, Steve Jobs, 1955-2011
On Wednesday evening at Oracle OpenWorld, we were tearing down the booth when we heard that Apple co-founder Steve Jobs had passed away. This is truly a loss for the entire IT industry. I never met Steve in person, nor have I been to any Apple conferences like MacWorld that he spoke at.
At various keynote sessions, Larry Ellison compared his Oracle products to those of Apple, Inc., suggesting that Oracle is the "Apple for the Enterprise".
On our way back to the Hilton hotel on O'Farrell, there was a candle vigil at the Apple Store near Union Square. People left sticky notes on the glass window.
There were a lot of tributes to Steve Jobs, but I liked this 15-minute video of his 2005 Commencement Speech at Stanford University titled [How to Live before you Die.
This will be one of those moments where years later, many people will remember exactly where they were, and what they were doing, when they heard the news. For many, that news came as tweets or text messages on the very iPhones and iPads he helped design.
- Rock Concert - Wednesday night
On Wednesday evening, I joined thousands of other attendees on Treasure Island to hear and watch Sting, Tom Petty and the Heartbreakers, and the English Beat in concert. It was cold and dark, but we all had a good time. Needless to say, I didn't make it to Marc Benioff's 8:00am Thursday morning session!
A word of advice: If you go to an evening rock concert at Treasure Island, dress warmly!
Despite the sad news about Steve Jobs, I had a great time at this conference. I learned a lot about what other IT vendors are doing, talked to dozens of IBM clients at the booth, and got to make some new friends that work in other parts of IBM.
(FTC Disclosure: I work for IBM. IBM and Apple are technology partners. I proudly own an Apple iPod, several Mac Mini computers and shares of stock in both IBM and Apple, Inc.)
technorati tags: IBM, Violin+Memory, GPFS, Americas+Cup, Netezza, DB2, POWER, HP, Salesforce.com, Steve Jobs, Hilton, Union Square, Standford University, , Treasure Island, Sting, Tom Petty, English Beat