Continuing my coverage of the [IBM Storage Innovation Executive Summit], that occurred May 9 in New York City, this is my fifth in a series of blog posts on this event.
- Smart Archiving
Doug Balog, IBM VP and Business Level Executive for Storage, presented Smart Archiving. Citing research by Jon Toigo, Doug indicated that 40 percent of data on disk should be archived. Sadly, a vast majority of companies continue to use their backups as archives. There is a better way to do archives, to address the needs of four use cases:
The IBM Information Archive for email, files and eDiscovery offers full text indexing. A well-deployed archive strategy can save up to 60 percent in backup costs, and reduce backup times by 80 percent. IBM offers advanced analytics and visualization for archive data.
An analysis of a global insurance company found that they kept, on average, 120 copies of every email sent. This was the combination of an average of 12 copies of the email, multipled by 10 backups of the email repository.
Banjercito, a bank in Mexico, has a 10-year retention requirement from government regulations.
The new LTFS Library Edition allows Library-based access to files stored on tape cartridges. The new TS3500 Library Connector means that a single system of connected tape libraries can hold up to 2.7 Exabytes (EB) of data.
- Archive Industry Perspectives
Steve Duplessie from Enterprise Strategy Group [ESG] gave his views on the challenges of volume, access and cost. His definition for archive: the long term retention of information on a separate environment for compliance, eDiscovery and business reference purposes. Steve advocates a purpose-built solutiion for archive. There are three major challenges for implementing an archive solution:
- Getting Participation -- Steve feels that key stakeholders have inappropriate expectations of what archive is, or can be.
- Define Tasks -- Steve argues that archive is very much a process-oriented approach, and tasks must fit business process and procedures
- Prepare for Future Content Types -- the frequent change of standard and proprietary data types poses a real challenge for long term retention of data
For example, the Financial Industry Regulatory Authority [FINRA] oversee 4,000 brokerage firms, and 600,000 broker/dealers. They have mandated the storing of digital data related to stock trades, and this can include text messages, voice messages, and emails. They continue to expand this definition, so soon this could include tweets on Twitter, for example.
Steve feels there are four key requirements for archive:
- Support for email, such as an email application plug-in
- Off-line access to archived data
- Support for mobile devices, such as smartphones
- Basic search capabilities
Companies are starting to take archive seriously. About 35 percent of firms surveyed have adopted archive, and another 36 percent plan to in the next 12-24 months. Enterprise archive has grown over 200 percent from 2007 to 2009. Steve agrees that not everything needs to be stored on disk. Retention periods greater than six years dictates the need for tape.
Current systems may not meet today's requirements. Data loss and downtime costs have skyrocketed. Data Protection and Retention projects can represent a gold mine of savings, new capabilities can greatly lower costs, allowing companies to shift resources over to revenue generation.
- Big Data, New Physics and Geospatial Super-Food
I would vote this the best session of the day! For all those confused on what the heck "Big Data" means, Jeff has the best explanation. Jeff Jonas is an IBM Distinguished Engineer and the Chief Scientist of Entity Analytics. He had just finished his 17th marathon on Saturday, and his fingers were bandaged.
Jeff had founded the Systems Research and Design (SR&D) company, known for creating NORA (non-obvious relationship awareness) used by Las Vegas casinos to identify fraud. SR&D was acquired by IBM back in 2005. Jeff is focused on sensemaking of streams. He feels many companies are suffering from "Enterprise Amnesia".
"The data must find the data .. and the relevance must find the user."
-- Jeff Jonas
Jeff's metaphor to Big Data is a jigsaw puzzle without the picture on the outside of the box. To demonstrate his point, he presented a pile of jigsaw puzzle pieces and asked four teenagers to put the puzzle together without the advantage of the picture on the box. What he had not told them was that he mixed four different puzzles together, removing out 10 to 20 percent of the pieces from each puzzle. He also added some duplicate pieces from a second identical puzzle, and just to make things fun, included a dozen pieces from a sixth puzzle just to mess with their heads. Within a few hours, the kids had managed to figure out that there were four puzzles, that there were duplicate pieces, and that there were some pieces that did not fit any of the four puzzles.
"You can't squeeze knowledge from a pixel."
-- Jeff Jonas
This approach favors false negatives. New observations reverse out old conceptions. As the picture emerges, this provides added focus on new information. More data can provide better predictions. "Bad" data, including misspelled words and mis-coded categories, was often discarded or corrected on the basis of "Garbage-In, Garbage Out", but can now be useful in a Big Data perspective.
Take for example the 600 billion recordings of the "location data" captured on cell phones every day. With regular triangulation of cell phone towers, the information can pinpoint you within 60 meters, add GPS and this improved to within 20 meters, and add Wi-Fi is further improved to 10 meters. While this data is "de-identified" so as not to identify individual users, the process of re-identification is relatively trivial. Jeff's system is able to predict a person will be next Thursday at 5:35pm with 87 percent accuracy.
Thus, Big Data represents an asset, accumulation of context. Real-time analytics can be a competitive advantage. These streams of data will need persistent storage and massive I/O capabilities. In one example, Jeff processed 4,200 separate sources of information and was able to identify "dead votes". These are votes cast by people that died in years prior, indicating voter fraud.
Jeff's latest project, codenamed G2, will tackle not just people, but everything from proteins to asteroids.
Normally, the worst time slot is the hour after lunch, but these presentations kept people's attention.
technorati tags: IBM, Summit, NYC, Doug Balog, Smart Archive, Information Archive, Banjercito, Steve Duplessie, ESG, , FINRA, Big Data, Jeff Jonas, NORA, SRD, Enterprise Amnesia, cell phone, location data
Webcast: How to Diagnose and Cure What Ails Your Storage Infrastructure
Wednesday, March 23, 2011 at 11:00 AM PDT / 11:00 AM Arizona MST / 2:00 PM EDT
Storage is the most poorly utilized infrastructure element -- and the most costly part of hardware budgets -- in most IT shops today. And it’s getting worse. Storage management typically involves nightmarish mash-up of tools for capacity management, performance management and data protection management unique to each array deployed in heterogeneous fabrics. Server and desktop virtualization seem to have made management issues worse, and coming on the heels of changing workloads and data proliferation is the requirement to add data management to the set of responsibilities shouldered by fewer and fewer storage professionals. Forecast for Storage in 2012: more pain as long delayed storage infrastructure refresh becomes mandatory.
In this webcast, fellow blogger Jon Toigo, CEO of Toigo Partners International, of [DrunkenData] fame, and I will take turns assessing the challenges and suggesting real-world solutions to the many issues that confound storage efficiency in contemporary IT. Integrating real world case studies and technology insights, our storage experts will deliver a must see webcast that sets down a strategy for fixing storage...before it fixes you.
Don't miss this event, unless you like the stress of knowing that your next disaster may be a data disaster.
Register for this webcast to come hear me and Jon Toigo talk!
technorati tags: IBM, Webcast, Jon Toigo, Storage Efficiency, Data Protection, Retention, Archive, Smarter Computing, Big Data, Optimized Systems, Cloud Computing
Wrapping up my week's coverage of the IBM Pulse 2011 conference, I have had several people ask me to explain IBM's latest initiative, Smarter Computing, which IBM launched this week at this conference. Having led the IT industry through the Centralized Computing era and the Distributed Computing era, IBM is now well-positioned to help companies, governments and non-profit organizations to enter the new Smarter Computing era, focused on insight and discovery.
|Centralized Computing||Distributed Computing||Smarter Computing|
- Thousands of IT professionals
- Mainframe servers
- Effiicent, but only the largest companies and governments had them
- Millions of office workers
- Personal computers (PC)
- Innovative, extending the reach to small and medium-sized businesses, but resulted in server sprawl and increased TCO
- Billions of people
- Smart phones and other handheld devices
- Efficient and Innovative, combining the best of centralized and distributed computing
|1952 to 1980||1981 to 2010||2011 and beyond|
To help clients with this transition, IBM's Smarter Computing initiative has three main components. This is a corporate-wide strategy, with systems, software and services all working together to realize results.
- Big Data
The first component is Big Data. This combines three different sources of data:
- Traditional structured data in OLTP databases and OLAP data warehouses, using data management solutions like DB2 and IBM Netezza.
- Unstructured data, including text documents, images, audio, and video, processed with massive parallelism using IBM BigInsights and Apache Hadoop.
- Real-Time Analytics Processing (RTAP) of incoming data, including video surveillance, social media, RFID chips, smart meters, and traffic control systems, processed with IBM InfoSphere Streams
Of course, Big Data will bring new opportunities on the storage front, which I will save for a future post!
- Optimized Systems
Rather than general purpose IT equipment, we have now the scale and scope to specialize with systems optimized for particular workloads, the second component of the Smarter Computing initiative. Of course, IBM has been delivering integrated stacks of systems, software and services for decades now, but it is important to remind people of this, as IBM now has a spate of competitors all trying to follow IBM's lead in this arena.
As with Big Data, the focus on Optimized Systems has impacted IBM's strategy on storage as well. I'll save that discussion for a future post as well!
I am glad that nearly all of the storage vendors have standardized to a common definition for Cloud, the third component of Smarter Computing, which shows that this concept has matured:
Cloud computing is a pay-per-use model for enabling network access to a pool of computing resources that can be provisioned and released rapidly with minimal management effort or service provider interaction.
-- U.S. National Institute of Standards and Technology [nist.gov]
Of course, Cloud is just an evolution of IBM's Service Bureau business of the 1960s and 1970s, renting out time-sharing on mainframe systems, Grid Computing of the 1980s, and Application Service Providers that popped up in the 1990s. While the [butchers, bakers and candlestick makers] that IBM competes against might focus their efforts on just private cloud or just public cloud, IBM recognizes the reality is that different clients will need different solutions. Rather than rip-and-replace, IBM will help clients transition to cloud via inclusive solutions that adopt a hybrid approach:
- Traditional enterprise with private cloud deployments, using solutions like IBM CloudBurst, SONAS and Information Archive
- Traditional enterprise with public cloud services to handle seasonable peaks, providing offsite resiliency, and solutions for a mobile workforce
- Hybrid clouds that blend private and public cloud services, to handle seasonal peak workloads, remote and branch offices
IBM's emphasis on IT Infrastructure Library (ITIL), Tivoli and Maximo products will play well in this space to provide integrated service management across traditional and cloud deployments. This is why IBM decided to launch Smarter Computing initiative at Pulse 2011 conference, the industry's premiere conference on intergrated service management.
The IBM Watson that competed on Jeopardy! is an excellent example of all three components of Smarter Computing at work.
- IBM Watson was able to respond to Jeopardy! clues within three seconds, processing a combination of database searches with DB2 and text-mining analytics of unstructured data with IBM BigInsights.
- IBM Watson combined servers, software and storage into an integrated supercomputer that was optimized for one particular workload: playing Jeopardy!
- IBM Watson used many technologies prevalent in private and public cloud computing systems, storing its data on a modified version of SONAS for storage, using xCat administration tools, networking across 10GbE Ethernet, and massive parallel processing through lots of PowerVM guest images.
technorati tags: IBM, Pulse, ibmpulse, Centralized Computing, Distributed Computing, Smarter Computing, Big Data, Optimized Systems, Cloud Computing, SONAS, Netezza, DB2, InfoSphere, BigInsights, SPSS, Data Warehouse, Structured Data, Unstructured Data, Watson, CloudBurst, Information Archive
Continuing my coverage of the Data Center 2010 conference, Monday I attended four keynote sessions.
- Opening Remarks
The first keynote speaker started out with an [English proverb]: Turbulent waters make for skillful mariners.
He covered the state of the global economy and how CIOs should address the challenge. We are on the flat end of an "L-shaped" recovery in the United States. GDP growth is expected to be only 4.7 percent Latin America, 2.3 percent in North America, 1.5 percent Europe. Top growth areas include 8.0 percent India and 8.6 percent China, with an average of 4.7 growth for the entire Asia Pacific region.
On the technical side, the top technologies that CIOs are pursuing for 2011 are Cloud Computing, Virtualization, Mobility, and Business Intelligence/Analytics. He asked the audience if the "Stack Wars" for integrated systems are hurting or helping innovation in these areas.
Move over "conflict diamonds", companies now need to worry about [conflict minerals].
He proposed an alternative approach called Fabric-Based Infrastructure. In this new model, a shared pool of servers is connected to a shared pool of storage over an any-to-any network. In this approach, IT staff spend all of their time just stocking up the vending machine, allowing end-users to get the resources they need.
- Crucial Trends You Need to Watch
The second speaker covered ten trends to watch, but these were not limited to just technology trends.
- Virtualization is just beginning - even though IBM has had server virtualization since 1967 and storage virtualization since 1974, the speaker felt that adoption of virtualization is still in its infancy. Ten years ago, average CPU utilization for x86 servers of was only 5-7 percent. Thanks to server virtualization like VMware and Hyper-V, companies have increased this to 25 percent, but many projects to virtualized have stalled.
- Big Data is the elephant in the room - storage growth is expected to grow 800 percent over the next 5 years.
- Green IT - Datacenters consume 40 to 100 times more energy than the offices they support. Six months ago, Energy Star had announced [standards for datacenters] and energy efficiency initiatives.
- Unified Communications - Voice over IP (VoIP) technologies, collaboration with email and instant messages, and focus on Mobile smartphones and other devices combines many overlapping areas of communication.
- Staff retention and retraining - According to US Labor statistics, the average worker will have 10 to 14 different jobs by the time they reach 38 years of age. People need to broaden their scope and not be so vertically focused on specific areas.
- Social Networks and Web 2.0 - the keynote speaker feels this is happening, and companies that try to restrict usage at work are fighting an uphill battle. Better to get ready for it and adopt appropriate policies.
- Legacy Migrations - companies are stuck on old technology like Microsoft Windows XP, Internet Explorer 6, and older levels of Office applications. Time is running out, but migration to later releases or alternatives like Red Hat Linux with Firefox browser are not trivial tasks.
- Compute Density - Moore's Law that says compute capability will double every 18 months is still going strong. We are now getting more cores per socket, forcing applications to re-write for parallel processing, or use virtualization technologies.
- Cloud Computing - every session this week will mention Cloud Computing.
- Converged Fabrics - some new approaches are taking shape for datacenter design. Fabric-based infrastructure would benefit from converging SAN and LAN fabrics to allow pools of servers to communicate freely to pools of storage.
He sprinkled fun factoids about our world to keep things entertaining.
- 50 percent of today's 21-year-olds have produced content for the web. 70 percent of four-year-olds have used a computer. The average teenager writes 2,282 text messages on their cell phone per month.
- This year, Google averaged 31 billion searches per month, compared 2.6 billion searches per month in 2007.
- More video has been uploaded to YouTube in the last two months than the three major US networks (ABC, NBC, CBS) have aired since 1948.
- Wikipedia averages 4300 new articles per day, and now has over 13 million articles.
- This year, Facebook reached 500 million users. If it were a country, it would be ranked third. Twitter would be ranked 7th, with 69% of their growth being from people 32-50 years old.
- In 1997, a GB of flash memory cost nearly $8000 to manufacture, today it is only $1.25 instead.
- The computer in today's cell phone is million times cheaper, and thousand times more powerful, than a single computer installed at MIT back in 1965. In 25 years, the compute capacity of today's cell phones could fit inside a blood cell.
See [interview of Ray Kurzweil] on the Singularity for more details.
- The Virtualization Scenario: 2010 to 2015
The third keynote covered virtualization. While server virtualization has helped reduce server costs, as well as power and cooling energy consumption, it has had a negative effect on other areas. Companies that have adopted server virtualization have discovered increased costs for storage, software and test/development efforts.
The result is a gap between expectations and reality. Many virtualization projects have stalled because there is a lack of long-term planning. The analysts recommend deploying virtualization in stages, tackle the first third, so called "low hanging fruit", then proceed with the next third, and then wait and evaluate results before completing the last third, most difficult applications.
Virtualization of storage and desktop clients are completely different projects than server virtualization and should be handled accordingly.
- Cloud Computing: Riding the Storm Out
The fourth keynote focus on the pros and cons of Cloud Computing. First they start by defining the five key attributes of Cloud: self-service, scalable elasticity, shared pool of resources, metered and paid per use, over open standard networking technologies.
In addition to IaaS, PaaS and SaaS classifications, the keynote speaker mentioned a fourth one: Business Process as a Service (BPaaS), such as processing Payroll or printing invoices.
While the debate rages over the benefits between private and public cloud approaches, the keynote speaker brings up the opportunites for hybrid and community clouds. In fact, he felt there is a business model for a "cloud broker" that acts as the go-between companies and cloud service providers.
A poll of the audience found the top concerns inhibiting cloud adoption were security, privacy, regulatory compliance and immaturity. Some 66 percent indicated they plan to spend more on private cloud in 2011, and 20 percent plan to spend more on public cloud options. He suggested six focus areas:
- Test and Development
- Prototyping / Proof-of-Concept efforts
- Web Application serving
- SaaS like email and business analytics
- Department-level applications
- Select workloads that lend themselves to parallelization
The session wrapped up with some stunning results reported by companies. Server provisioning accomplished in 3-5 minutes instead of 7-12 weeks. Reduced cost of email by 70 percent. Four-hour batch jobs now completed in 20 minutes. 50 percent increase in compute capacity with flat IT budget. With these kind of results, the speaker suggests that CIOs should at least start experimenting with cloud technologies and start to profile their workloads and IT services to develop a strategy.
That was just Monday morning, this is going to be an interesting week!
technorati tags: IBM, GDP, Cloud Computing, virtualization, mobility, BI, CIO, Big Data, Green IT, Google, Twitter, Facebook, IaaS, PaaS, SaaS, BPaaS