Continuing my coverage of the Data Center 2010 conference, Monday I attended four keynote sessions.
- Opening Remarks
The first keynote speaker started out with an [English proverb]: Turbulent waters make for skillful mariners.
He covered the state of the global economy and how CIOs should address the challenge. We are on the flat end of an "L-shaped" recovery in the United States. GDP growth is expected to be only 4.7 percent Latin America, 2.3 percent in North America, 1.5 percent Europe. Top growth areas include 8.0 percent India and 8.6 percent China, with an average of 4.7 growth for the entire Asia Pacific region.
On the technical side, the top technologies that CIOs are pursuing for 2011 are Cloud Computing, Virtualization, Mobility, and Business Intelligence/Analytics. He asked the audience if the "Stack Wars" for integrated systems are hurting or helping innovation in these areas.
Move over "conflict diamonds", companies now need to worry about [conflict minerals].
He proposed an alternative approach called Fabric-Based Infrastructure. In this new model, a shared pool of servers is connected to a shared pool of storage over an any-to-any network. In this approach, IT staff spend all of their time just stocking up the vending machine, allowing end-users to get the resources they need.
- Crucial Trends You Need to Watch
The second speaker covered ten trends to watch, but these were not limited to just technology trends.
- Virtualization is just beginning - even though IBM has had server virtualization since 1967 and storage virtualization since 1974, the speaker felt that adoption of virtualization is still in its infancy. Ten years ago, average CPU utilization for x86 servers of was only 5-7 percent. Thanks to server virtualization like VMware and Hyper-V, companies have increased this to 25 percent, but many projects to virtualized have stalled.
- Big Data is the elephant in the room - storage growth is expected to grow 800 percent over the next 5 years.
- Green IT - Datacenters consume 40 to 100 times more energy than the offices they support. Six months ago, Energy Star had announced [standards for datacenters] and energy efficiency initiatives.
- Unified Communications - Voice over IP (VoIP) technologies, collaboration with email and instant messages, and focus on Mobile smartphones and other devices combines many overlapping areas of communication.
- Staff retention and retraining - According to US Labor statistics, the average worker will have 10 to 14 different jobs by the time they reach 38 years of age. People need to broaden their scope and not be so vertically focused on specific areas.
- Social Networks and Web 2.0 - the keynote speaker feels this is happening, and companies that try to restrict usage at work are fighting an uphill battle. Better to get ready for it and adopt appropriate policies.
- Legacy Migrations - companies are stuck on old technology like Microsoft Windows XP, Internet Explorer 6, and older levels of Office applications. Time is running out, but migration to later releases or alternatives like Red Hat Linux with Firefox browser are not trivial tasks.
- Compute Density - Moore's Law that says compute capability will double every 18 months is still going strong. We are now getting more cores per socket, forcing applications to re-write for parallel processing, or use virtualization technologies.
- Cloud Computing - every session this week will mention Cloud Computing.
- Converged Fabrics - some new approaches are taking shape for datacenter design. Fabric-based infrastructure would benefit from converging SAN and LAN fabrics to allow pools of servers to communicate freely to pools of storage.
He sprinkled fun factoids about our world to keep things entertaining.
- 50 percent of today's 21-year-olds have produced content for the web. 70 percent of four-year-olds have used a computer. The average teenager writes 2,282 text messages on their cell phone per month.
- This year, Google averaged 31 billion searches per month, compared 2.6 billion searches per month in 2007.
- More video has been uploaded to YouTube in the last two months than the three major US networks (ABC, NBC, CBS) have aired since 1948.
- Wikipedia averages 4300 new articles per day, and now has over 13 million articles.
- This year, Facebook reached 500 million users. If it were a country, it would be ranked third. Twitter would be ranked 7th, with 69% of their growth being from people 32-50 years old.
- In 1997, a GB of flash memory cost nearly $8000 to manufacture, today it is only $1.25 instead.
- The computer in today's cell phone is million times cheaper, and thousand times more powerful, than a single computer installed at MIT back in 1965. In 25 years, the compute capacity of today's cell phones could fit inside a blood cell.
See [interview of Ray Kurzweil] on the Singularity for more details.
- The Virtualization Scenario: 2010 to 2015
The third keynote covered virtualization. While server virtualization has helped reduce server costs, as well as power and cooling energy consumption, it has had a negative effect on other areas. Companies that have adopted server virtualization have discovered increased costs for storage, software and test/development efforts.
The result is a gap between expectations and reality. Many virtualization projects have stalled because there is a lack of long-term planning. The analysts recommend deploying virtualization in stages, tackle the first third, so called "low hanging fruit", then proceed with the next third, and then wait and evaluate results before completing the last third, most difficult applications.
Virtualization of storage and desktop clients are completely different projects than server virtualization and should be handled accordingly.
- Cloud Computing: Riding the Storm Out
The fourth keynote focus on the pros and cons of Cloud Computing. First they start by defining the five key attributes of Cloud: self-service, scalable elasticity, shared pool of resources, metered and paid per use, over open standard networking technologies.
In addition to IaaS, PaaS and SaaS classifications, the keynote speaker mentioned a fourth one: Business Process as a Service (BPaaS), such as processing Payroll or printing invoices.
While the debate rages over the benefits between private and public cloud approaches, the keynote speaker brings up the opportunites for hybrid and community clouds. In fact, he felt there is a business model for a "cloud broker" that acts as the go-between companies and cloud service providers.
A poll of the audience found the top concerns inhibiting cloud adoption were security, privacy, regulatory compliance and immaturity. Some 66 percent indicated they plan to spend more on private cloud in 2011, and 20 percent plan to spend more on public cloud options. He suggested six focus areas:
- Test and Development
- Prototyping / Proof-of-Concept efforts
- Web Application serving
- SaaS like email and business analytics
- Department-level applications
- Select workloads that lend themselves to parallelization
The session wrapped up with some stunning results reported by companies. Server provisioning accomplished in 3-5 minutes instead of 7-12 weeks. Reduced cost of email by 70 percent. Four-hour batch jobs now completed in 20 minutes. 50 percent increase in compute capacity with flat IT budget. With these kind of results, the speaker suggests that CIOs should at least start experimenting with cloud technologies and start to profile their workloads and IT services to develop a strategy.
That was just Monday morning, this is going to be an interesting week!
technorati tags: IBM, GDP, Cloud Computing, virtualization, mobility, BI, CIO, Big Data, Green IT, Google, Twitter, Facebook, IaaS, PaaS, SaaS, BPaaS
Continuing my coverage of the 30th annual [Data Center Conference]. Here is a recap of the Tuesday morning sessions:
- Wells Fargo: Data Center Lessons Learned from the Wachovia Acquisition
This was the next in their "Mastermind Interview" series. The analyst interviewed Scott Dillon, EVP and Head of Technology Infrastructure Services for Wells Fargo bank. Some 13 years ago, Wells Fargo merged with Norwest, and three years ago, Wells Fargo merged again, this time with Wachovia bank. Today, the new merged Wells Fargo manages 1.2 Trillion USD in assets, some 12,000 ATMs, and 9,000 branch offices within two miles of 50 percent of the US population.
On the technical side, Scott's team has to deal with 10,000 IT changes per month, spanning 85 discrete businesses that Wells Fargo is involved in. To help drive the consolidation, they formed a culture group called "One Wells Fargo".
Often, Wells Fargo and Wachovia used different applications for the same function. The consolidation team took the A-or-B-but-not-C approach, which means they would either choose the existing application that Wells Fargo was already using (A), or the one that Wachovia was already using (B), but not look for a replacement (C). They also wanted to avoid re-platforming any apps during the merger. This simplified the process of developing target operating models (TOMs).
Before each application cut-over, the consolidation team did dry-run, dress rehearsals and walkthroughs over the phone to ensure smooth success. They wanted a Wachovia account holder to be able to walk into the bank on one day, and then come back the next day as a Wells Fargo account holder, into the same branch office but now with Wells Fargo signage, with minimal disruption.
Wells Fargo also adopted a test-to-learn approach of choosing small test markets to see how well the transition would work before tackling larger, more complicated markets. For example, they started in Colorado, where Wells Fargo has a huge presence, but Wachovia had a small presence.
This was first and foremost a business merger, not just an IT merger. Each decision to 6-18 months to act on, and the IT team spent the last three years working every weekend to make this a reality.
- A Satirical Look at Business and Technology
Comedian Bob Hirschfeld presented a light-hearted look at the IT industry. Bob actually attended sessions on Monday at this conference so his satire was exceptionally hard-hitting. He took jabs at the latest IT job requirements, padding on light poles, IBM Watson, social media's impact on dictators, various industry acronyms, virtualization, the various reasons why printer ink is so expensive, and the evil masterminds behind Powerpoint.
- Storing Big Data takes a Village
Two analysts co-presented this session on the 12 dimensions of information management that revolve around the volume, variety and velocity of "Big Data".
In the past, it took a while to gather data, and a while to process the data, so annual, quarterly and monthly reports were common. Today, with high-velocity streams like Twitter, especially during cultural events or natural disasters, data is produced and analyzed quickly. It is important to sort the steady-state from the anomalies.
Myth 1: All data fits nicely into relational databases. The analysts feel the concept of putting everything into one big data base is dead. Some data sets are so complicated that traditional database joins would cause smoke to come out of the sides of the servers. Instead, new technologies have emerged, including NoSQL, Cassandra, Hadoop, Columnar databases, and In-memory databases. XML has helped to bring together disparate data formats.
Companies need to adapt to this new reality of Business Analytics. Here is a poll of the audience on how many are in what stage of adaptation:
Myth 2: Everyone will do Big Data with commodity hardware. Businesses want commmercial offerings that don't fail every day. (For example, instead of using open-source Hadoop, consider IBM's [InfoSphere BigInsights] commercial product based on Hadoop designed for the Enterprise).
Myth 3: Big Data is too big for backup. Certainly, traditional full-plus-incremental approaches fail to scale, but that is not the only option you have. Consider disk replication, snapshots, and integrated disk-and-tape blended solutions that adopt a more progressive backup methodology.
Capacity forecasting can be difficult with Big Data. Scale-out NAS systems, including IBM SONAS and the various me-too competitive offerings, were originally focused on High Performance Computing (HPC) and the Media & Entertainment (M&E) industries, are now ready for prime-time and appropriate for other use cases.
It's like the game of Clue, but instead of Professor Plum with the candlestick in the library, it was Chuck with the Cluster in the Closet. To avoid shadow IT creating huge Hadoop Clusters in your closets, encourage the use of Cloud Computing for "sandbox" projects. IBM, Amazon and others offer hosted MapReduce engines for this purpose.
What type of storage do you plan to use for Big Data? The top five, weighted from a list during a poll of the audience were: (78) traditional disk arrays, (71) Scale-out NAS, (46) pre-configured appliances, (30) Hadoop clusters, and (23) Cloud Storage.
Big Data is about doing things differently. Do your employees understand analytical techniques? Your company may need to start thinking about policies for capturing Big Data, storing it correctly, and analyzing it for insights and patterns needed to stay competitive.
It was good to mix reality with a bit of humor. Some of these conference attendees take themselves too seriously, and it is good to be reminded that IT is just part of the overall business operation.
technorati tags: IBM, Wells Fargo, Wachovia, Scott Dillon, , Bob Hirschfeld, Big Data, SONAS, NoSQL, Cassandra, Hadoop, Columnar databases, business analytics
Modified by TonyPearson
This week I am in Orlando, Florida for the IBM Edge conference. Thursday evening after all the other sessions, we had a Free-for-All, a Q&A panel across all storage topics, moderated by Scott Drummond. The conference officially ends at noon tomorrow, but for many, this is the last session, as people fly out Friday morning. Here are the questions and the panel responses during the session.
When will IBM unify their storage management between Mainframe z/OS and the distributed systems platforms?
IBM offers a Change and Configuration Management Data Base (CCMDB) for this purpose with appropriate collectors from z/OS and distributed systems, but hasn't sold well.
When will IBM devices have RESTful interfaces?
Both IBM Systems Director and IBM Tivoli Storage Productivity Center (TPC) offer RESTful APIs. IBM Systems Director can manage z/VM and Linux on System z, as well as Power Systems and x86 based distributed systems. Since October 2008, IBM's Project Zero introduced RESTful interfaces to PHP and Groovy software running on WebSphere sMash environments. We have not heard much about this since 2008.
Will IBM TPC support NPIV on Power Systems?
TPC 5.1 has toleration support for this, showing the first port connection discovered, but not all connections, and we expect to retrofit this toleration to TPC 4.2.2 Fixpack 2. Hopefully, we will have full support in a future release.
We would like TPC for Replication to run on Linux for System z. We do not run z/OS at the disaster recovery site location.
Submit an IBM Request for Enhancement [RFE] for this. We have TPC for Replication on z/OS, as well as the distributed systems version that runs on Windows, Linux and AIX.
We have enhancements we would like to see for XIV and SONAS also, can we use the RFE process for this also?
Yes, submit the requirements for our review.
We heard the Statement of Direction that there would be storage integrated into the PureSystems. What exactly does that mean?
The PureSystems family of expert-integrated systems is based on a new chassis that has a front part, a midplane, and a back-part. All IBM System Storage products that support x86 and Power Systems can work with PureSystems. However, IBM does not yet offer storage that fits in the front part of the PureFlex chassis, but the Statement of Direction indicates that we intend to offer that option. Until then, the IBM Storwize V7000 is the storage of choice that can be put into the PureSystems rack, but outside the individual chasses.
We see some features like Real-Time Compression being put into the SAN Volume Controller (SVC), and other features put into the back-end devices. How are we supposed to make sense of this?
IBM's new pilot program, the SmartCloud Virtual Storage Center, to bring these all together. In general, we have design teams of system architects that determine which features go in which products, and prioritize accordingly.
We heard the IBM Executives during the opening session indicate that IBM's strategy involves supporting Big Data, but I haven't seen any storage that supports native Hadoop interfaces. Did I miss something?
First, I want to emphasize that Big Data is more than just MapReduce workloads. IBM offers Streams and BigInsights software to handle text, as well as Business Intelligence and Data Warehouse solutions for structured data. IBM's General Parallel File System (GPFS) has a Shared-Nothing-Cluster (SNC) mode with Hadoop interfaces that runs twice as fast as Hadoop's native HDFS file system. The storage products we recommend for Big Data are the SONAS and the DCS3700 disk systems, as both are optimized for the sequential workloads Big Data represents.
Everytime we upgrade our SVC, we review the list for SDDPCM multi-pathing and see that we need to upgrade our back-end DS8000 microcode up to recommended levels. Can we get a list of combinations that work from other customers?
The advantage of storage hypervisors like SVC is that we can separate the multi-pathing driver from the back-end managed disk systems. You only need the SDDPCM to support the SVC, not the back-end devices. For the most part, SVC has not dropped support for any level of previously supported OS or multi-pathing software.
On SVC, when we migrate volumes (vDisks) from one storage pool to another, we would like to throttle this process during FlashCopy.
Yes, we had several requests like this, which is why we now recommend using Volume Mirorring to perform migrations. In fact the GUI wizard uses Volume Mirroring by default when migrations are performed. As for throttling, IBM has implemented "I/O Priority Manager" that offers Quality of Service classes for DS8000 and XIV Gen3, and might consider porting this to other products in our portfolio.
Sizing systems is an art. I just need to know if the DS8000 is running hot. Can we have the equivalent of "red lines" for our disk systems similar to automobile engines?
Storage Optimizer was added to TPC 4.2 to help in this area, identifying heat-maps for IBM DS8000, DS6000, DS5000, DS4000, SVC and Storwize V7000. We recommend you look at the performance violation reports.
How can we evaluate the characteristics of our workloads?
Yes, TPC can do this.
When we are replacing non-IBM storage with IBM, we don't have good tools to evaluate the non-IBM equipment. What is IBM doing for this?
IBM's Disk Magic modeling tool can take inputs from a variety of sources, including iostat from the servers themselves. You can also install a 90-day trial of TPC to help with this.
We really like EMC's "Grab" program, does IBM have one also?
Yes, IBM has one also. See the [SSIC Discovery Utility].
Updating the Host Attachment Kit (HAK) for AIX is quite painful for the SVC. We prefer the method employed for the XIV.
Thanks for the feedback.
For SVC, we need to correlate disk with VMware and VIOS. Can we get vSCSI information on VIOS?
TPC 5.1 has this support, and we believe it has been retrofitted to TPC 4.2.2 Fixpack 2, coming out this month.
Currently, with SVC, when volumes are part of a Global Mirror (GM) session, we need to cancel GM, expand the source volume, expand the target volume, then restart GM. We would like this to be fully automated and non-disruptive.
Sounds like a great requirement to submit for the RFE process.
Can we get an RSS Feed for the RFE community.
Yes, you can subscribe to it. You can also set up "Watch Lists".
Thanks to all of the IBM experts on the panel for their participation at this event!
technorati tags: IBM, Edge2012, Free-for-All, CCMDB, Project Zero, RESTful, TPC, SVC, RFE, Storwize V7000, PureSystems, PureFlex, SmartCloud, Virtual Storage Center, Big+Data, SONAS, XIV, DS8000, Global Mirror
Continuing my coverage of the 30th annual [Data Center Conference]. Here is a recap of more of the Tuesday afternoon sessions:
- IBM CIOs and Storage
Barry Becker, IBM Manager of Global Strategic Outsourcing Enablement for Data Center Services, presented this session on Storage Infrastructure Optimization (SIO).
A bit of context might help. I started my career in DFHSM which moved data from disk to tape to reduce storage costs. Over the years, I wouuld visit clients, analyze their disk and tape environment, and provide a set of recommendations on how to run their operations better. In 2004, this was formalized into week-long "Information Lifecycle Management (ILM) Assessments", and I spent 18 months in the field training a group of folks on how to perform them. The IBM Global Technology Services team have taken a cross-brand approach, expanding this ILM approach to include evaluations of the application workloads and data types. These SIO studies take 3-4 weeks to complete.
Over the next decade, there will only be 50 percent more IT professionals than we have today, so new approaches will be needed for governance and automation to deal with the explosive growth of information.
SIO deals with both the demand and supply of data growth in five specific areas:
- Data reclamation, rationalization and planning
- Virtualization and tiering
- Backup, business continuity and disaster recovery
- Storage process and governance
- Archive, Retention and Compliance
The process involves gathering data and interview business, financial and technical stakeholders like storage administrators and application owners. The interviews take less than one hour per person.
Over the past two years, the SIO team has uncovered disturbing trends. A big part of the problem is that 70 percent of data stored on disk has not been accessed in the past 90 days, and is unlikely to be accessed at all in the near future, so would probably be better to store on lower cost storage tiers.
Storage Resource Management (SRM) is also a mess, with over 85 percent of clients having serious reporting issues. Even rudimentary "Showback" systems to report back what every individual, group or department were using resulted in significant improvement.
Archive is not universally implemented mostly because retention requirements are often misunderstood. Barry attributed this to lack of collaboration between storage IT personnel, compliance officers, and application owners. A "service catalog" that identifies specific storage and data types can help address many of these concerns.
The results were impressive. Clients that follow SIO recommendations save on average 20 to 25 percent after one year, and 50 percent after three to five years. Implementing storage virtualization averaged 22 percent lower CAPEX costs. Those that implemented a "service catalog" saved on average $1.9 million US dollars. Internally, IBM's own operations have saved $13 million dollars implementing these recommendations over the past three years.
- Reshaping Storage for Virtualization and Big Data
The two analysts presenting this topic acknowledged there is no downturn on the demand for storage. To address this, they recommend companies identify storage inefficiencies, develop better forecasting methodologies, implement ILM, and follow vendor management best practices during acquisition and outsourcing.
To deal with new challenges like virtualization and Big Data, companies must decide to keep, replace or supplement their SRM tools, and build a scalable infrastructure.
One suggestion to get upper management to accept new technologies like data deduplication, thin provisioning, and compression is to refer to them as "Green" technologies, as they help reduce energy costs as well. Thin provisioning can help drive up storage utilization to rates as high as you dare, typically 60 to 70 percent is what most people are comfortable with.
A poll of the audience found that top three initiatives for 2012 are to implement data deduplication, 10Gb Ethernet, and Solid-State drives (SSD).
The analysts explained that there are two different types of cloud storage. The first kind is storage "for" the cloud, used for cloud compute instances (aka Virtual Machines), such as Amazon EBS for EC2. The second kind is storage "as" the cloud, storage as a data service, such as Amazon S3, Azure Blob and AT&T Synaptic.
The analysts feel that cloud storage deployments will be mostly private clouds, bursting as needed to public cloud storage. This creates the need for a concept called "Cloud Storage Gateways" that manage this hybrid of some local storage and some remote storage. IBM's SONAS Active Cloud Engine provides long-distance caching in this manner. Other smaller startups include cTera, Nasuni, Panzura, Riverbed, StorSimple, and TwinStrata.
A variation of this are "storage gateways" for backup and archive providers as a staging area for data to be subsequently sent on to the remote location.
New projects like virtualization, Cloud computing and Big Data are giving companies a new opportunity to re-evaluate their strategies for storage, process and governance.
technorati tags: IBM, SIO, SRM, deduplication, 10GbE, SSD, Amazon, EBS, EC2, Azure, SONAS, Active Cloud Engine, Cloud Computing, virtualization, Big Data
This week I am in Moscow, Russia for today's "Edge Comes to You" event. Although we had over 20 countries represented at the Edge2012 conference in Orlando, Florida earlier this month, IBM realizes that not everyone can travel to the United States. So, IBM has created the "Edge Comes to You" events where a condensed subset of the agenda is presented. Over the next four months, these events are planned in about two dozen other countries.
This is my first time in Russia, and the weather was very nice. With over 11 million people, Moscow is the 6th largest city in the world, and boasts having the largest community of billionaires. With this trip, I have now been to all five of the so-called BRICK countries (Brazil, Russia, India, China and Korea) in the past five years!
The venue was the [Info Space Transtvo Conference Center] not far from the Kremlin. While Barack Obama was making friends with Vladimir Putin this week at the G2012 Summit in Mexico, I was making friends with the lovely ladies at the check-in counter.
If it looks like some of the letters are backwards, that is not an illusion. The Russian language uses the [Cyrillic alphabet]. The backwards N ("И"), backwards R ("Я"), the number 3 ("З), and what looks like the big blue staple logo from Netapp ("П"), are actually all characters in this alphabet.
Having spent eight years in a fraternity during college, I found these not much different from the Greek alphabet. Once you learn how to pronounce each of the 33 characters, you can get by quite nicely in Moscow. I successfully navigated my way through Moscow's famous subway system, and ordered food on restaurant menus.
The conference coordinators were Tatiana Eltekova (left) and Natalia Grebenshchikova (right). Business is booming in Russia, and IBM just opened ten new branch offices throughout the country this month. So these two ladies in the marketing department have been quite busy lately.
I especially liked all the attention to detail. For example, the signage was crisp and clean, and the graphics all matched the Powerpoint charts of each presentation.
Moscow is close to the North pole, similar in latitude as Juneau, Alaska; Edinburgh, Scottland; Copenhagen, Denmark; and Stockholm, Sweden.
As a result, it is daylight for nearly 18 hours a day. The first part of the day, from 8:00am to 4:30pm, was "Technical Edge", a condensed version of the 4.5 day event in Orlando, Florida. I gave three of the five keynote presentations:
- Game Change on a Smarter Planet: A New Era in IT, discussing Smarter Computing and Expert-Integrated systems, based on what Rod Adkins presented in Orlando.
- A New Approach to Storage, explaining IBM Smarter Storage for Smarter Computing, IBM's new approach to the way storage is designed and deployed for our clients
- IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy! explaining how IBM Watson technologies are being used in Healthcare and Financial Services, based on what I presented in Orlando.
(Note: I do not speak Russian fluently enough to give a technical presentation, so I did then entire presentation in English, and had real-time translators convert to Russian for me. The audience wore headphones. However, I was able to sprinkly a few Russian phrases, such as "доброе утро", "Я не понимаю по-русский" and "спасибо".)
After the keynote sessions, I was interviewed by a journalist for [Storage News] magazine. The questions covered a variety of topics, from the implications of [Big Data analytics] to the future of storage devices that employ [Phase Change Memory]. I look forward to reading the article when it gets published!
The afternoon had break-out sessions in three separate rooms. Each room hosted seven topics, giving the attendees plenty to choose from for each time slot. I presented one of these break-out sessions, Big Data Cloud Storage Technology Comparison. The title was already printed in all the agendas, so we went with it, but I would have rather called it "Big Data Storage Options". In this session, I explained Hadoop, InfoSphere BigInsights, internal and external storage options.
I spent some time comparing Hadoop File System (HDFS) with IBM's own General Parallel File System (GPFS) which now offers Hadoop interfaces in a Shared-Nothing Cluster (SNC) configuration. IBM GPFS is about twice as fast as HDFS for typical workloads.
At the end of the Technical Edge event, there was a prize draw. Business cards were drawn at random, and three lucky attendees won a complete four-volume set of my book series "Inside System Storage"! Sadly, these got held up in customs, so we provided a "certificate" to redeem them for the books when they arrive to the IBM office.
The second part of the day, from 5:00pm to 8pm, was "Executive Edge", a condensed version of the 2 day event in Orlando, designed for CIOs and IT leaders. Having this event in the evening allowed busy executives to come over after they spend the day in the office. I presented IBM Storage Strategy in the Smarter Computing Era, similar to my presentation in Orlando.
Both events were well-attended. Despite fighting jet lag across 11 time zones, I managed to hang in there for the entire day. I got great feedback and comments from the attendees. I look forward to hearing how the other "Edge Comes to You" events fare in the other countries. I would like to thank Tatiana and Natalia for their excellent work organizing and running this event!
technorati tags: IBM, Moscow, Russia, Edge, ECTY, Cyrillic, Tatiana Eltekova, Natalia Grebenshchikova, Smarter Storage, Smarter Computing, Smarter Planet, Big Data, Cloud, IBM Watson, Jeopardy, Hadoop, HDFS, InfoSphere, BigInsights, GPFS, GPFS-SNC
Wrapping up my week's coverage of the IBM Pulse 2011 conference, I have had several people ask me to explain IBM's latest initiative, Smarter Computing, which IBM launched this week at this conference. Having led the IT industry through the Centralized Computing era and the Distributed Computing era, IBM is now well-positioned to help companies, governments and non-profit organizations to enter the new Smarter Computing era, focused on insight and discovery.
|Centralized Computing||Distributed Computing||Smarter Computing|
- Thousands of IT professionals
- Mainframe servers
- Effiicent, but only the largest companies and governments had them
- Millions of office workers
- Personal computers (PC)
- Innovative, extending the reach to small and medium-sized businesses, but resulted in server sprawl and increased TCO
- Billions of people
- Smart phones and other handheld devices
- Efficient and Innovative, combining the best of centralized and distributed computing
|1952 to 1980||1981 to 2010||2011 and beyond|
To help clients with this transition, IBM's Smarter Computing initiative has three main components. This is a corporate-wide strategy, with systems, software and services all working together to realize results.
- Big Data
The first component is Big Data. This combines three different sources of data:
- Traditional structured data in OLTP databases and OLAP data warehouses, using data management solutions like DB2 and IBM Netezza.
- Unstructured data, including text documents, images, audio, and video, processed with massive parallelism using IBM BigInsights and Apache Hadoop.
- Real-Time Analytics Processing (RTAP) of incoming data, including video surveillance, social media, RFID chips, smart meters, and traffic control systems, processed with IBM InfoSphere Streams
Of course, Big Data will bring new opportunities on the storage front, which I will save for a future post!
- Optimized Systems
Rather than general purpose IT equipment, we have now the scale and scope to specialize with systems optimized for particular workloads, the second component of the Smarter Computing initiative. Of course, IBM has been delivering integrated stacks of systems, software and services for decades now, but it is important to remind people of this, as IBM now has a spate of competitors all trying to follow IBM's lead in this arena.
As with Big Data, the focus on Optimized Systems has impacted IBM's strategy on storage as well. I'll save that discussion for a future post as well!
I am glad that nearly all of the storage vendors have standardized to a common definition for Cloud, the third component of Smarter Computing, which shows that this concept has matured:
Cloud computing is a pay-per-use model for enabling network access to a pool of computing resources that can be provisioned and released rapidly with minimal management effort or service provider interaction.
-- U.S. National Institute of Standards and Technology [nist.gov]
Of course, Cloud is just an evolution of IBM's Service Bureau business of the 1960s and 1970s, renting out time-sharing on mainframe systems, Grid Computing of the 1980s, and Application Service Providers that popped up in the 1990s. While the [butchers, bakers and candlestick makers] that IBM competes against might focus their efforts on just private cloud or just public cloud, IBM recognizes the reality is that different clients will need different solutions. Rather than rip-and-replace, IBM will help clients transition to cloud via inclusive solutions that adopt a hybrid approach:
- Traditional enterprise with private cloud deployments, using solutions like IBM CloudBurst, SONAS and Information Archive
- Traditional enterprise with public cloud services to handle seasonable peaks, providing offsite resiliency, and solutions for a mobile workforce
- Hybrid clouds that blend private and public cloud services, to handle seasonal peak workloads, remote and branch offices
IBM's emphasis on IT Infrastructure Library (ITIL), Tivoli and Maximo products will play well in this space to provide integrated service management across traditional and cloud deployments. This is why IBM decided to launch Smarter Computing initiative at Pulse 2011 conference, the industry's premiere conference on intergrated service management.
The IBM Watson that competed on Jeopardy! is an excellent example of all three components of Smarter Computing at work.
- IBM Watson was able to respond to Jeopardy! clues within three seconds, processing a combination of database searches with DB2 and text-mining analytics of unstructured data with IBM BigInsights.
- IBM Watson combined servers, software and storage into an integrated supercomputer that was optimized for one particular workload: playing Jeopardy!
- IBM Watson used many technologies prevalent in private and public cloud computing systems, storing its data on a modified version of SONAS for storage, using xCat administration tools, networking across 10GbE Ethernet, and massive parallel processing through lots of PowerVM guest images.
technorati tags: IBM, Pulse, ibmpulse, Centralized Computing, Distributed Computing, Smarter Computing, Big Data, Optimized Systems, Cloud Computing, SONAS, Netezza, DB2, InfoSphere, BigInsights, SPSS, Data Warehouse, Structured Data, Unstructured Data, Watson, CloudBurst, Information Archive
This week, Sep 19-23, O'Reilly is hosting the Strata Conference on Big Data. I'm not there either, but here is a live feed:
technorati tags: IBM, O'Reilly, Strata, Big Data
Continuing my coverage of the [IBM Storage Innovation Executive Summit], that occurred May 9 in New York City, this is my fifth in a series of blog posts on this event.
- Smart Archiving
Doug Balog, IBM VP and Business Level Executive for Storage, presented Smart Archiving. Citing research by Jon Toigo, Doug indicated that 40 percent of data on disk should be archived. Sadly, a vast majority of companies continue to use their backups as archives. There is a better way to do archives, to address the needs of four use cases:
The IBM Information Archive for email, files and eDiscovery offers full text indexing. A well-deployed archive strategy can save up to 60 percent in backup costs, and reduce backup times by 80 percent. IBM offers advanced analytics and visualization for archive data.
An analysis of a global insurance company found that they kept, on average, 120 copies of every email sent. This was the combination of an average of 12 copies of the email, multipled by 10 backups of the email repository.
Banjercito, a bank in Mexico, has a 10-year retention requirement from government regulations.
The new LTFS Library Edition allows Library-based access to files stored on tape cartridges. The new TS3500 Library Connector means that a single system of connected tape libraries can hold up to 2.7 Exabytes (EB) of data.
- Archive Industry Perspectives
Steve Duplessie from Enterprise Strategy Group [ESG] gave his views on the challenges of volume, access and cost. His definition for archive: the long term retention of information on a separate environment for compliance, eDiscovery and business reference purposes. Steve advocates a purpose-built solutiion for archive. There are three major challenges for implementing an archive solution:
- Getting Participation -- Steve feels that key stakeholders have inappropriate expectations of what archive is, or can be.
- Define Tasks -- Steve argues that archive is very much a process-oriented approach, and tasks must fit business process and procedures
- Prepare for Future Content Types -- the frequent change of standard and proprietary data types poses a real challenge for long term retention of data
For example, the Financial Industry Regulatory Authority [FINRA] oversee 4,000 brokerage firms, and 600,000 broker/dealers. They have mandated the storing of digital data related to stock trades, and this can include text messages, voice messages, and emails. They continue to expand this definition, so soon this could include tweets on Twitter, for example.
Steve feels there are four key requirements for archive:
- Support for email, such as an email application plug-in
- Off-line access to archived data
- Support for mobile devices, such as smartphones
- Basic search capabilities
Companies are starting to take archive seriously. About 35 percent of firms surveyed have adopted archive, and another 36 percent plan to in the next 12-24 months. Enterprise archive has grown over 200 percent from 2007 to 2009. Steve agrees that not everything needs to be stored on disk. Retention periods greater than six years dictates the need for tape.
Current systems may not meet today's requirements. Data loss and downtime costs have skyrocketed. Data Protection and Retention projects can represent a gold mine of savings, new capabilities can greatly lower costs, allowing companies to shift resources over to revenue generation.
- Big Data, New Physics and Geospatial Super-Food
I would vote this the best session of the day! For all those confused on what the heck "Big Data" means, Jeff has the best explanation. Jeff Jonas is an IBM Distinguished Engineer and the Chief Scientist of Entity Analytics. He had just finished his 17th marathon on Saturday, and his fingers were bandaged.
Jeff had founded the Systems Research and Design (SR&D) company, known for creating NORA (non-obvious relationship awareness) used by Las Vegas casinos to identify fraud. SR&D was acquired by IBM back in 2005. Jeff is focused on sensemaking of streams. He feels many companies are suffering from "Enterprise Amnesia".
"The data must find the data .. and the relevance must find the user."
-- Jeff Jonas
Jeff's metaphor to Big Data is a jigsaw puzzle without the picture on the outside of the box. To demonstrate his point, he presented a pile of jigsaw puzzle pieces and asked four teenagers to put the puzzle together without the advantage of the picture on the box. What he had not told them was that he mixed four different puzzles together, removing out 10 to 20 percent of the pieces from each puzzle. He also added some duplicate pieces from a second identical puzzle, and just to make things fun, included a dozen pieces from a sixth puzzle just to mess with their heads. Within a few hours, the kids had managed to figure out that there were four puzzles, that there were duplicate pieces, and that there were some pieces that did not fit any of the four puzzles.
"You can't squeeze knowledge from a pixel."
-- Jeff Jonas
This approach favors false negatives. New observations reverse out old conceptions. As the picture emerges, this provides added focus on new information. More data can provide better predictions. "Bad" data, including misspelled words and mis-coded categories, was often discarded or corrected on the basis of "Garbage-In, Garbage Out", but can now be useful in a Big Data perspective.
Take for example the 600 billion recordings of the "location data" captured on cell phones every day. With regular triangulation of cell phone towers, the information can pinpoint you within 60 meters, add GPS and this improved to within 20 meters, and add Wi-Fi is further improved to 10 meters. While this data is "de-identified" so as not to identify individual users, the process of re-identification is relatively trivial. Jeff's system is able to predict a person will be next Thursday at 5:35pm with 87 percent accuracy.
Thus, Big Data represents an asset, accumulation of context. Real-time analytics can be a competitive advantage. These streams of data will need persistent storage and massive I/O capabilities. In one example, Jeff processed 4,200 separate sources of information and was able to identify "dead votes". These are votes cast by people that died in years prior, indicating voter fraud.
Jeff's latest project, codenamed G2, will tackle not just people, but everything from proteins to asteroids.
Normally, the worst time slot is the hour after lunch, but these presentations kept people's attention.
technorati tags: IBM, Summit, NYC, Doug Balog, Smart Archive, Information Archive, Banjercito, Steve Duplessie, ESG, , FINRA, Big Data, Jeff Jonas, NORA, SRD, Enterprise Amnesia, cell phone, location data
Webcast: How to Diagnose and Cure What Ails Your Storage Infrastructure
Wednesday, March 23, 2011 at 11:00 AM PDT / 11:00 AM Arizona MST / 2:00 PM EDT
Storage is the most poorly utilized infrastructure element -- and the most costly part of hardware budgets -- in most IT shops today. And it’s getting worse. Storage management typically involves nightmarish mash-up of tools for capacity management, performance management and data protection management unique to each array deployed in heterogeneous fabrics. Server and desktop virtualization seem to have made management issues worse, and coming on the heels of changing workloads and data proliferation is the requirement to add data management to the set of responsibilities shouldered by fewer and fewer storage professionals. Forecast for Storage in 2012: more pain as long delayed storage infrastructure refresh becomes mandatory.
In this webcast, fellow blogger Jon Toigo, CEO of Toigo Partners International, of [DrunkenData] fame, and I will take turns assessing the challenges and suggesting real-world solutions to the many issues that confound storage efficiency in contemporary IT. Integrating real world case studies and technology insights, our storage experts will deliver a must see webcast that sets down a strategy for fixing storage...before it fixes you.
Don't miss this event, unless you like the stress of knowing that your next disaster may be a data disaster.
Register for this webcast to come hear me and Jon Toigo talk!
technorati tags: IBM, Webcast, Jon Toigo, Storage Efficiency, Data Protection, Retention, Archive, Smarter Computing, Big Data, Optimized Systems, Cloud Computing