This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available. More details available on our FAQ. (Read in Japanese.)
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
45 percent reduction in energy costs for Ethernet switching gear
80 percent fewer cables
15 percent lower costs
doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward!
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday morning sessions.
A Data Center Perspective on MegaVendors
The morning started with a keynote session. The analyst felt that the eight most strategic or disruptive companies in the past few decades were: IBM, HP, Cisco, SAP, Oracle, Apple and Google. Of these, he focused on the first three, which he termed the "Megavendors", presented in alphabetical order.
Cisco enjoys high-margins and a loyal customer base with Ethernet switch gear. Their new strategy to sell UP and ACROSS the stack moves them into lower-margin business like servers. Their strong agenda with NetApp is not in sync with their partnership with EMC. They recently had senior management turn-over.
HP enjoys a large customer base and is recognized for good design and manufacturing capabilities. Their challenges are mostly organizational, distracted by changes at the top and an untested and ever-changing vision, shifting gears and messages too often. Concerns over the Itanium have not helped them lately.
IBM defies simple description. One can easily recognize Cisco as an "Ethernet Switch" company, HP as a "Printer Company", Oracle as a "Database Company', but you can't say that IBM is an "XYZ" company, as it has re-invented itself successfully over its past 100 years, with a strong focus on client relationships. IBM enjoys high margins, sustainable cost structure, huge resources, a proficient sales team, and is recognized for its innovation with a strong IBM Research division. Their "Smarter Planet" vision has been effective in supporting their individual brands and unlock new opportuties. IBM's focus on growth markets takes advantage of their global reach.
His final advice was to look for "good enough" solutions that are "built for change" rather than "built to last".
Chris works in the Data Center Management and Optimization Services team. IBM owns and/or manages over 425 data centers, representing over 8 million square feet of floorspace. This includes managing 13 million desktops, and 325,000 x86 and UNIX server images, and 1,235 mainframes. IBM is able to pool resources and segment the complexity for flexible resource balancing.
Chris gave an example of a company that selected a Cloud Compute service provided on the East coast a Cloud Storage provider on the West coast, both for offering low rates, but was disappointed in the latency between the two.
Chris asked "How did 5 percent utilization on x86 servers ever become acceptable?" When IBM is brought in to manage a data center, it takes a "No Server Left Behind" approach to reduce risk and allow for a strong focus on end-user transition. Each server is evaluated for its current utilization:
Amazingly, many servers are unused. These are recycled properly.
1 to 19 percent
Workload is virtualized and moved to a new server.
20 to 39 percent
Use IBM's Active Energy Manager to monitor the server.
40 to 59 percent
Add more VMs to this virtualized server.
over 60 percent
Manage the workload balance on this server.
This approach allows IBM to achieve a 60 to 70 percent utilization average on x86 machines, with an ROI payback period of 6 to 18 months, and 2x-3x increase of servers-managed-per-FTE.
Storage is classified using Information Lifecycle Management (ILM) best practices, using automation with pre-defined data placement and movement policies. This allows only 5 percent of data to be on Tier-1, 15 percent on Tier-2, 15 percent on Tier-3, and 65 percent on Tier-4 storage.
Chris recommends adopting IT Service Management, and to shift away from one-off builds, stand-alone apps, and siloed cost management structures, and over to standardization and shared resources.
You may have heard of "Follow-the-sun" but have you heard of "Follow-the-moon"? Global companies often establish "follow-the-sun" for customer service, re-directing phone calls to be handled by people in countries during their respective daytime hours. In the same manner, server and storage virtualization allows workloads to be moved to data centers during night-time hours, following the moon, to take advantage of "free cooling" using outside air instead of computer room air conditioning (CRAC).
Since 2007, IBM has been able to double computer processing capability without increasing energy consumption or carbon gas emissions.
It's Wednesday, Day 3, and I can tell already that the attendees are suffering from "information overload'.
Continuing my coverage of the 30th annual [Data Center Conference]. Here is a recap of more of the Tuesday afternoon sessions:
IBM CIOs and Storage
Barry Becker, IBM Manager of Global Strategic Outsourcing Enablement for Data Center Services, presented this session on Storage Infrastructure Optimization (SIO).
A bit of context might help. I started my career in DFHSM which moved data from disk to tape to reduce storage costs. Over the years, I wouuld visit clients, analyze their disk and tape environment, and provide a set of recommendations on how to run their operations better. In 2004, this was formalized into week-long "Information Lifecycle Management (ILM) Assessments", and I spent 18 months in the field training a group of folks on how to perform them. The IBM Global Technology Services team have taken a cross-brand approach, expanding this ILM approach to include evaluations of the application workloads and data types. These SIO studies take 3-4 weeks to complete.
Over the next decade, there will only be 50 percent more IT professionals than we have today, so new approaches will be needed for governance and automation to deal with the explosive growth of information.
SIO deals with both the demand and supply of data growth in five specific areas:
Data reclamation, rationalization and planning
Virtualization and tiering
Backup, business continuity and disaster recovery
Storage process and governance
Archive, Retention and Compliance
The process involves gathering data and interview business, financial and technical stakeholders like storage administrators and application owners. The interviews take less than one hour per person.
Over the past two years, the SIO team has uncovered disturbing trends. A big part of the problem is that 70 percent of data stored on disk has not been accessed in the past 90 days, and is unlikely to be accessed at all in the near future, so would probably be better to store on lower cost storage tiers.
Storage Resource Management (SRM) is also a mess, with over 85 percent of clients having serious reporting issues. Even rudimentary "Showback" systems to report back what every individual, group or department were using resulted in significant improvement.
Archive is not universally implemented mostly because retention requirements are often misunderstood. Barry attributed this to lack of collaboration between storage IT personnel, compliance officers, and application owners. A "service catalog" that identifies specific storage and data types can help address many of these concerns.
The results were impressive. Clients that follow SIO recommendations save on average 20 to 25 percent after one year, and 50 percent after three to five years. Implementing storage virtualization averaged 22 percent lower CAPEX costs. Those that implemented a "service catalog" saved on average $1.9 million US dollars. Internally, IBM's own operations have saved $13 million dollars implementing these recommendations over the past three years.
Reshaping Storage for Virtualization and Big Data
The two analysts presenting this topic acknowledged there is no downturn on the demand for storage. To address this, they recommend companies identify storage inefficiencies, develop better forecasting methodologies, implement ILM, and follow vendor management best practices during acquisition and outsourcing.
To deal with new challenges like virtualization and Big Data, companies must decide to keep, replace or supplement their SRM tools, and build a scalable infrastructure.
One suggestion to get upper management to accept new technologies like data deduplication, thin provisioning, and compression is to refer to them as "Green" technologies, as they help reduce energy costs as well. Thin provisioning can help drive up storage utilization to rates as high as you dare, typically 60 to 70 percent is what most people are comfortable with.
A poll of the audience found that top three initiatives for 2012 are to implement data deduplication, 10Gb Ethernet, and Solid-State drives (SSD).
The analysts explained that there are two different types of cloud storage. The first kind is storage "for" the cloud, used for cloud compute instances (aka Virtual Machines), such as Amazon EBS for EC2. The second kind is storage "as" the cloud, storage as a data service, such as Amazon S3, Azure Blob and AT&T Synaptic.
The analysts feel that cloud storage deployments will be mostly private clouds, bursting as needed to public cloud storage. This creates the need for a concept called "Cloud Storage Gateways" that manage this hybrid of some local storage and some remote storage. IBM's SONAS Active Cloud Engine provides long-distance caching in this manner. Other smaller startups include cTera, Nasuni, Panzura, Riverbed, StorSimple, and TwinStrata.
A variation of this are "storage gateways" for backup and archive providers as a staging area for data to be subsequently sent on to the remote location.
New projects like virtualization, Cloud computing and Big Data are giving companies a new opportunity to re-evaluate their strategies for storage, process and governance.
Continuing my coverage of the 30th annual [Data Center Conference]. Here is a recap of some of the Tuesday afternoon sessions:
Brocade: Maximizing Your Cloud: How Data Centers Must Evolve
This was a session sponsored by Brocade to promote their concept of the "Ethernet Fabric". The first speaker, John McHugh, was from Brocade, and the second speaker was a client testimonial, Jamie Shepard, EVP for International Computerware, Inc.
John had an interesting take on today's network challenges. He feels that most LANs are organized for "North-South" traffic, referring to upload/downloads between clients and servers. However, the networks of tomorrow will need to focus on "East-West" traffic, referring to servers talking to other servers.
John was also opposed to integrated stacks that combine servers, storage and networking into a single appliance, as this prevents independent scaling of resources.
The Future of Backup is Not Backup
Primary data is growing at 40 to 60 percent compound annual growth rate (CAGR), but backup data is growing faster. Why? Because data that was not backed up before are now being backed up, including test data, development data, and mobile application data.
Backup costs are 19x more expensive than production software costs. There is an enormous gap in data protection because companies fail to factor this into their budgets. It is not uncommon for IT departments to use multiple backup tools, for example one tool for VMs, and another tool for servers, and a third product for desktops.
part of the problem is identifying who "buys" the backup software. The server team might focus on the operating systems supported. The storage team focuses on the disk and tape media supported. The application owners focus on the features and capabilities for backup that minimize impact to their application.
The analyst organized these issues into three "C's" of backup concerns: Cost, Capability and Complexity. Cost is not just the software license fee for the backup software, but the cost of backup media, courier fees, and transmisison bandwidth. Capability refers to the features and functions, and IT folks are tired of having to augment their backup solution with additional tools and scripts to compensate for lack of capability. Complexity refers to the challenges trying to get existing backup software to tackle new sources like Virtual Machines, Mobile apps, and so on.
Has everyone moved to a tape-less backup system? Polling results found that people are shifting back to tape, either in a tape-only environment, or to supplement their disk or disk-based virtual tape library (VTL). Here are the polling results:
The poll also showed the top three backup software vendors were Symantec, IBM and Commvault, which is consistent with marketshare. However, the analyst feels that by 2014, an estimated 30 percent of companies will change their backup softwar vendor out of frustration over cost, capability and/or complexity.
There are a lot new backup software products specific to dealing with Virtual Machines. Some are focused exclusively on VMware. When asked what tool people used to backup their VMs, the polling results showed the following. NOte that 20 percent for Other includes products from major vendors, like IBM Tivoli Storage Manager for Virtual Environments, as the analyst was more interested in the uptake of backup software from startups.
Some companies are considering Cloud Computing for backup. This is one area where having the cloud service provider at a distance is an actual advantage for added protection. A poll asking whether some or most data is backed up to the Cloud, either already today, or plans for the near future within the next 12 or 24 months, showed the following:
In addition to backup service providers, there are now several startups that offer file sharing, and some are adding "versioning" to this that can serve as an alternative to backup. These include DropBox, SugarSync, iCloud, SpiderOak and ShareFile.
The final topic was Snapshot and Disk Replication. These tend to be hardware-based, so they may not have options for versioning, scheduling, or application-aware capabilities normally associated with backup software. Space-efficient snapshots, which point unchanged data back to the original source, may not provide full data protection that disparate backup copies would provide. Here were polling results on whether snapshot/replication was used to augment or replace some or most of their backups:
Some of his observations and recommendations:
Maintenance is more expensive than acquisition cost. Don't focus on the tip of the iceberg. Some backup software is more efficient for bandwidth and media which will save tons of money in the long run.
Try to optimize what you have. He calls this the "Starbuck's effect". If you just need one coffee, then paying $4.50 for a cup makes sense. But if you need 100 coffees, you might be better off buying the beans.
Design backups to meet service level agreements (SLAs). In the past, backup was treated as one-size-fits-all, but today you can now focus on a workload by workload basis.
Be conservative in adopting new technologies until you have your backup procedures in place to handle data protection.
Backup is for operational recovery, not long-term retention of data. A poll showed two-thirds of the audience kept backup versions for longer than 60 days! Re-evaluate how long you keep backups, and how many versions you keep. If you need long-term retention, use archive process instead.
Recovery testing is a dying art. Practice recovery procedures so that you can do it safely and correctly when it matters most.
The analyst had a series of awesome pictures of large structures, the pyramids of Giza, the Chrysler building, and so on, and how they would look without their foundations in place. Backup is a foundation and should be treated as such in all IT planning purposes.
IT is evolving, but some basic needs like networking and backup procedures don't change. As companies re-evaluate their IT operations for Big Data, Cloud Computing and other new technologies, it is best to remember that some basic needs must be met as part of those evaluations.