Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
A long time ago, perhaps in the early 1990s, I was an architect on the component known today as DFSMShsm on z/OS mainframe operationg system. One of my job responsibilities was to attend the biannual [SHARE conference to listen to the requirements of the attendees on what they would like added or changed to the DFSMS, and ask enough questions so that I can accurately present the reasoning to the rest of the architects and software designers on my team. One person requested that the DFSMShsm RELEASE HARDCOPY should release "all" the hardcopy. This command sends all the activity logs to the designated SYSOUT printer. I asked what he meant by "all", and the entire audience of 120 some attendees nearly fell on the floor laughing. He complained that some clever programmer wrote code to test if the activity log contained only "Starting" and "Ending" message, but no error messages, and skip those from being sent to SYSOUT. I explained that this was done to save paper, good for the environment, and so on. Again, howls of laughter. Most customers reroute the SYSOUT from DFSMS from a physical printer to a logical one that saves the logs as data sets, with date and time stamps, so having any "skipped" leaves gaps in the sequence. The client wanted a complete set of data sets for his records. Fair enough.
When I returned to Tucson, I presented the list of requests, and the immediate reaction when I presented the one above was, "What did he mean by ALL? Doesn't it release ALL of the logs already?" I then had to recap our entire dialogue, and then it all made sense to the rest of the team. At the following SHARE conference six months later, I was presented with my own official "All" tee-shirt that listed, and I am not kidding, some 33 definitions for the word "all", in small font covering the front of the shirt.
I am reminded of this story because of the challenges explaining complicated IT concepts using the English language which is so full of overloaded words that have multiple meanings. Take for example the word "protect". What does it mean when a client asks for a solution or system to "protect my data" or "protect my information". Let's take a look at three different meanings:
The first meaning is to protect the integrity of the data from within, especially from executives or accountants that might want to "fudge the numbers" to make quarterly results look better than they are, or to "change the terms of the contract" after agreements have been signed. Clients need to make sure that the people authorized to read/write data can be trusted to do so, and to store data in Non-Erasable, Non-Rewriteable (NENR) protected storage for added confidence. NENR storage includes Write-Once, Read-Many (WORM) tape and optical media, disk and disk-and-tape blended solutions such as the IBM Grid Medical Archive Solution (GMAS) and IBM Information Archive integrated system.
The second meaning is to protect access from without, especially hackers or other criminals that might want to gather personally-identifiably information (PII) such as social security numbers, health records, or credit card numbers and use these for identity theft. This is why it is so important to encrypt your data. As I mentioned in my post [Eliminating Technology Trade-Offs], IBM supports hardware-based encryption FDE drives in its IBM System Storage DS8000 and DS5000 series. These FDE drives have an AES-128 bit encryption built-in to perform the encryption in real-time. Neither HDS or EMC support these drives (yet). Fellow blogger Hu Yoshida (HDS) indicates that their USP-V has implemented data-at-rest in their array differently, using backend directors instead. I am told EMC relies on the consumption of CPU-cycles on the host servers to perform software-based encryption, either as MIPS consumed on the mainframe, or using their Powerpath multi-pathing driver on distributed systems.
There is also concern about internal employees have the right "need-to-know" of various research projects or upcoming acquisitions. On SANs, this is normally handled with zoning, and on NAS with appropriate group/owner bits and access control lists. That's fine for LUNs and files, but what about databases? IBM's DB2 offers Label-Based Access Control [LBAC] that provides a finer level of granularity, down to the row or column level. For example, if a hospital database contained patient information, the doctors and nurses would not see the columns containing credit card details, the accountants would not see the columnts containing healthcare details, and the individual patients, if they had any access at all, would only be able to access the rows related to their own records, and possibly the records of their children or other family members.
The third meaning is to protect against the unexpected. There are lots of ways to lose data: physical failure, theft or even incorrect application logic. Whatever the way, you can protect against this by having multiple copies of the data. You can either have multiple copies of the data in its entirety, or use RAID or similar encoding scheme to store parts of the data in multiple separate locations. For example, with RAID-5 rank containing 6+P+S configuration, you would have six parts of data and one part parity code scattered across seven drives. If you lost one of the disk drives, the data can be rebuilt from the remaining portions and written to the spare disk set aside for this purpose.
But what if the drive is stolen? Someone can walk up to a disk system, snap out the hot-swappable drive, and walk off with it. Since it contains only part of the data, the thief would not have the entire copy of the data, so no reason to encrypt it, right? Wrong! Even with part of the data, people can get enough information to cause your company or customers harm, lose business, or otherwise get you in hot water. Encryption of the data at rest can help protect against unauthorized access to the data, even in the case when the data is scattered in this manner across multiple drives.
To protect against site-wide loss, such as from a natural disaster, fire, flood, earthquake and so on, you might consider having data replicated to remote locations. For example, IBM's DS8000 offers two-site and three-site mirroring. Two-site options include Metro Mirror (synchronous) and Global Mirror (asynchronous). The three-site is cascaded Metro/Global Mirror with the second site nearby (within 300km) and the third site far away. For example, you can have two copies of your data at site 1, a third copy at nearby site 2, and two more copies at site 3. Five copies of data in three locations. IBM DS8000 can send this data over from one box to another with only a single round trip (sending the data out, and getting an acknowledgment back). By comparison, EMC SRDF/S (synchronous) takes one or two trips depending on blocksize, for example blocks larger than 32KB require two trips, and EMC SRDF/A (asynchronous) always takes two trips. This is important because for many companies, disk is cheap but long-distance bandwidth is quite expensive. Having five copies in three locations could be less expensive than four copies in four locations.
Fellow blogger BarryB (EMC Storage Anarchist) felt I was unfair pointing out that their EMC Atmos GeoProtect feature only protects against "unexpected loss" and does not eliminate the need for encryption or appropriate access control lists to protect against "unauthorized access" or "unethical tampering".
(It appears I stepped too far on to ChuckH's lawn, as his Rottweiler BarryB came out barking, both in the [comments on my own blog post], as well as his latest titled [IBM dumbs down IBM marketing (again)]. Before I get another rash of comments, I want to emphasize this is a metaphor only, and that I am not accusing BarryB of having any canine DNA running through his veins, nor that Chuck Hollis has a lawn.)
As far as I know, the EMC Atmos does not support FDE disks that do this encryption for you, so you might need to find another way to encrypt the data and set up the appropriate access control lists. I agree with BarryB that "erasure codes" have been around for a while and that there is nothing unsafe about using them in this manner. All forms of RAID-5, RAID-6 and even RAID-X on the IBM XIV storage system can be considered a form of such encoding as well. As for the amount of long-distance bandwidth that Atmos GeoProtect would consume to provide this protection against loss, you might question any cost savings from this space-efficient solution. As always, you should consider both space and bandwidth costs in your total cost of ownership calculations.
Of course, if saving money is your main concern, you should consider tape, which can be ten to twenty times cheaper than disk, affording you to keep a dozen or more copies, in as many time zones, at substantially lower cost. These can be encrypted and written to WORM media for even more thorough protection.
I presented IBM's Smart Archive strategy and the storage products IBM offers to archive data and meet compliance regulations:
The differences between backup and archive, including a few of my own personal horror stories helping companies who had foolishly thought that keeping backup copies for years would adequately serve as their archive strategy
The differences between Write-Once Read-Many (WORM) media, and Non-Erasable, Non-Rewriteable (NENR) storage options.
How disk-only archive solutions become "space heaters" for your data center.
An overview of the various storage hardware options from IBM.
An explanation of the different IBM software offerings to help complement the storage hardware choices.
IBM TotalStorage Productivity Center (TPC): New Features and Functions
Mike Griese, IBM program manager for TPC, presented the latest in TPC 5.1 version announced this week. His session was organized into four key sections:
Insights - TPC 5.1 integrates COGNOS reporting, which allows custonmization of reports and ad-hoc exploration and analysis. Since the reports are not binary-compiled into the product, IBM can ship new COGNOS reports as templates outside the normal TPC release schedule. Also, TPC 5.1 got smarter on reporting on server virtualization hypervisor environments to avoid double-counting.
Recommendations - TPC 5.1 can analyze your usage patterns across the entire data center and make recommendations to move data from one storage tier to another. You can then act on these recommendations by moving data from one tier to another, either "up-tier" to faster storage, or "down-tier" to less expensive storage, using a storage hypervisor like IBM SAN Volume Controller. This is complementary to features like Easy Tier which optimize within a single disk system.
Performance - TPC 5.1 uses a new web-based GUI, based on AJAX, HTML5 and Dojo widgets, inspired by the IBM XIV GUI, and similar to the web-based GUI of SAN Volume Controller, Storwize V7000 and SONAS.
Mike also explained the new TPC 5.1 packaging. Instead of having a variety of components like "TPC for Disk", "TPC for Data", and "TPC for Replication", the new packaging simplifies this down to two levels of functionality. The basic level supports block-level devices, including disk performance, replication and SAN fabric management. The advanced level adds support for files and databases, including support for Cloud management such as SONAS environments.
Dan Zehnpfennig, Solution Architect, talked about his experiences installing TPC 5.1 and how this was much improved over previous TPC versions.
IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy!
There is still time to enroll for [IBM Edge], a conference focused on storage, to be held June 4-8 in Orlando, Florida. There is an early-bird discount until May 6!
I will be there all week! Here are the seven sessions I will be presenting at the Technical Edge side of the event:
Understanding Your Options for Storing Archive Data to Meet Compliance Challenges
This session will cover the IBM software and hardware solutions that your organization can use to store archive data, including features like immutability, Write-Once-Read-Many (WORM) technology and Non-Erasable, Non-Rewriteable (NENR) enforcement. The discussion will include high-level concepts like chronological and event-based retention, litigation hold and release, as well as an overview of the products and solutions from IBM that you can deploy today.
IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy!
In 2011, the IBM Watson computer was able to beat the top-earning human winners on the trivia game-show “Jeopardy!” As I was the author of [How to Build Your Own Watson Junior in Your Basement], I have been asked to explain how the IBM Watson system was put together, how it works, and what examples of text mining and big data analytics means for society as we apply technology to meet tomorrow's challenges.
Using Social Media for IBM System Storage - Birds of a Feather
I will be moderating this Birds of a Feather, or BOF, session that will bring together a Q&A panel of experts on how social media can be leveraged to help you do your job, get the information you need to solve problems, and share your knowledge with others.
Data Footprint Reduction: Understanding IBM Storage Efficiency Options
Data Footprint Reduction is the catch-all term for a variety of technologies designed to help reduce storage costs. In this session, I will cover thin provisioning, space-efficient copies, deduplication and compression technologies, and describe the IBM storage products that provide these capabilities.
IBM's Storage Strategy in the Smarter Computing Era
Confused about IBM's new initiatives for Big Data analytics, Workload Optimized Systems, and Cloud Computing? This session will explain it all, and how IBM's strategy for its various storage products and solutions fit into these overall themes.
IBM SONAS and the IBM Cloud Storage Taxonomy
Confused over the different types of cloud storage? IBM's scale-out Network Attached Storage (SONAS) can be used in a variety of use cases. This session will provide an overview of IBM's SONAS solution, provide an update on the latest features and functions recently announced, and explain how it can be deployed in various private, public and hybrid cloud environments.
IBM Tivoli Storage Productivity Center Overview and Update
IBM has enhanced its premier storage infrastructure management tool: IBM Tivoli Storage Productivity Center. This session will provide both an overview of the product, and explain the latest features and functions recently announced.
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
45 percent reduction in energy costs for Ethernet switching gear
80 percent fewer cables
15 percent lower costs
doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward!
IBM had over a dozen storage-related announcements this week. This is my third and final part in my series to provide a quick overview of the announcements.
IBM Tivoli® Storage Manager v6.3
IBM Tivoli Storage Manager is market-leading software that provides not just backup, but also HSM and archive capabilities across a wide variety of operating systems. Originally developed in the IBM Almaden Research Center, it then moved about 15 years ago to Tucson to become a commercial product.
The new TSM v6.3 introduces site-to-site hot-standby disaster recovery feature that replicates the TSM meta data and data for fast recovery. The maximum number of objects supported has doubled to four billion. Reporting has been enhanced using technologies borrowed from IBM Cognos. Lastly, a feature on Tivoli Storage Productivity Center has been carried forward to deploy and update agents on the various clients.
IBM Tivoli Storage FlashCopy Manager coordinates application-aware backups through the use of point-in-time copy services such as FlashCopy or Snapshot on various IBM and non-IBM disk systems. The versions can remain on disk, or optionally processed by Tivoli Storage Manager to move them to external storage such as tape for added protection.
There will always be a spot in my heart for this product, as the method to use FlashCopy for application-aware backups on the mainframe was my 19th patent, and subsequently delivered as a series of enhancements to DFSMS over the past decade on the z/OS operating system. It is good to see this innovation has "jumped over" to distributed systems.
The new FlashCopy Manager v3.1 adds support for HP-UX and VMware, expands support for IBM DB2 and Oraqcle databases, and introduces an interface for custom business applications.
IBM Tivoli Storage Manager for Virtual Environments v6.3
TSM for VE is a new addition to the TSM family, focused on being able to coordinate hypervisor-aware data protection. Initially it supports VMware, but IBM has plans to support a variety of other server virtualization hypervisors as well, as over 40 percent of companies run two or more hypervisors in their data center.
The new TSM for VE v6.3 adds a VMware vCenter plug-in, and support for hardware-based disk snapshots.
IBM Tivoli Storage Productivity Center v4.2.2
A long time ago, I was the chief architect IBM Tivoli Storage Productivity Center v1, now we are already up to v4.2.2 release!
IBM has added enhanced reporting based on IBM Cognos technology, including storage tiering analysis reports (STAR). Few companies keep all of their storage tiers in a single disk system. Rather, they have different boxes, and often from different vendors. IBM's Productivity Center can report on both IBM and non-IBM disk systems. New this release is support for the internal disks of the Storwize V7000 midrange disk system.
Productivity Center's "SAN Planner" has been enhanced to consider XIV replication criteria. This SAN Planner helps clients decide where to carve LUNs, and to make sure they pick the right place given all of the criteria such as remote copy replications.
Last year, we introduced Productivity Center for Disk Midrange Edition (MRE) which to offer lower price when you are only managing midrange disk systems DS5000, DS3000, Storwize V7000 and SVC managing these. This was so successful, that we now have TPC Select, which is basically Productivity Center Standard Edition (SE) for these midrange disk systems.
Whew! I have already heard from some of my readers to slow down, that this is too much information to deal with all at once. IBM has tried everything from having just a few announcements nearly every Tuesday, to having huge launches every two to three years, and settled in the middle with announcements about four to five times per year.