This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this community and its apps will no longer be available. More details available on our FAQ.
During the break, I talked with some of the other bloggers at this event. From left to right: Stephen Foskett [Pack Rat] blog, Devang Panchigar [StorageNerve], and yours truly, Tony Pearson. (Picture courtesy of Stephen Foskett)
Meet the Experts
This next segment was a Q&A panel, with a moderator posing questions to four experts. Originally, I was scheduled to be the moderator, but this was changed to Doug Balog. The experts on the panel were:
Rich Castagna, Editorial Director for Storage Media, TechTarget. TechTarget is the group that runs the [SearchStorage] website.
Stan Zaffos, Gartner VP of Research, who spoke earlier today. I have worked with Stan for years as well, and have attended the last four Gartner Data Center Conferences held every December in Las Vegas.
Steve Duplessie, Founder and Senior Analyst, Enterprise Strategy Group (ESG). Steve's blog is titled [The Bigger Truth].
Jon clarified a statement Doug Balog said earlier in the day attributed to his study. Doug had said that 40 percent of all data should be archived. The study that Jon Toigo had done found that, on average, for the data on disk systems, about 30 percent is useful data, 40 percent is not active and could be eligible for archive, and the remaining 30 percent was crap.
The other experts introduced themselves. Rich felt that "Cloud" was still the biggest buzzword in the IT industry. Stan felt that CIOs should ask their storage administrators "What are you doing to improve my agility and efficiency". Steve felt that it was better to focus on improving process and procedures, rather than trying to deploy the best technology.
How can you best reduce backup costs per TB?
Jon- use tape.
Rich- Clean up your environment.
Stan- Don't rehydrate your deduplicated data, adopt archive approach, and revisit your backup schedules.
Steve- Deduplication covers up stupidity. No band-aids! Companies need to address the cause.
Does Backup as a Public Service for large enterprises makes sense?
Rich- Yes, especially for those with Remote Office/Branch Office (ROBO).
Stan- It depends. You should implement client-side dedupe. Get the Cloud Provider to waive telecom bandwidth charges.
Steve- Consider recovery scenarios, and try to maintain control.
Jon- "Clouds" are bulls@#$ marketing. WAN latency will pile up.
What are the top issues IT leaders should be discussing with the Storage Managers?
Stan- To ensure SLAs meet but not exceed design, to automate, and to evaluate SAN/NAS ratios.
Steve- Server virtualization is putting the spotlight on storage. Failure to implement storage virtualization is becoming the gate that slows down sever virtualization adoption.
Jon- Insist on management features from all storage vendors, try to separate feature/function from the underlying hardware layer. See IBM's [Project Zero].
Rich- Efficiency, Archiving, Thin Provisioning, Compression, Data Protection & Retention, Backup Redesign to protect endpoints like laptops and cell phones.
When does Archive eliminate Backup?
The need for protection never goes away. There are two kinds of data: "originals" and "derivatives", and two kinds of disk: "failed" and "not yet failed".
Given SATA and SAS drives, what is the future of 10K/15K RPM drives?
There is no future for these faster drives, they are going away.
What is the biggest challenge for adopting archive?
It is easy to move data out of production systems, but difficult to make these archives accessible for eDiscovery and Search. There is also concern about changing data formats. Adobe has changed the format of PDF a whopping 33 times.
This was by far the most entertaining section of the day! Hand-held devices allowed the audience to vote which answers they liked best.
Doug Balog, IBM VP and Business Level Executive for Storage, presented Smart Archiving. Citing research by Jon Toigo, Doug indicated that 40 percent of data on disk should be archived. Sadly, a vast majority of companies continue to use their backups as archives. There is a better way to do archives, to address the needs of four use cases:
The IBM Information Archive for email, files and eDiscovery offers full text indexing. A well-deployed archive strategy can save up to 60 percent in backup costs, and reduce backup times by 80 percent. IBM offers advanced analytics and visualization for archive data.
An analysis of a global insurance company found that they kept, on average, 120 copies of every email sent. This was the combination of an average of 12 copies of the email, multipled by 10 backups of the email repository.
Banjercito, a bank in Mexico, has a 10-year retention requirement from government regulations.
The new LTFS Library Edition allows Library-based access to files stored on tape cartridges. The new TS3500 Library Connector means that a single system of connected tape libraries can hold up to 2.7 Exabytes (EB) of data.
Archive Industry Perspectives
Steve Duplessie from Enterprise Strategy Group [ESG] gave his views on the challenges of volume, access and cost. His definition for archive: the long term retention of information on a separate environment for compliance, eDiscovery and business reference purposes. Steve advocates a purpose-built solutiion for archive. There are three major challenges for implementing an archive solution:
Getting Participation -- Steve feels that key stakeholders have inappropriate expectations of what archive is, or can be.
Define Tasks -- Steve argues that archive is very much a process-oriented approach, and tasks must fit business process and procedures
Prepare for Future Content Types -- the frequent change of standard and proprietary data types poses a real challenge for long term retention of data
For example, the Financial Industry Regulatory Authority [FINRA] oversee 4,000 brokerage firms, and 600,000 broker/dealers. They have mandated the storing of digital data related to stock trades, and this can include text messages, voice messages, and emails. They continue to expand this definition, so soon this could include tweets on Twitter, for example.
Steve feels there are four key requirements for archive:
Support for email, such as an email application plug-in
Off-line access to archived data
Support for mobile devices, such as smartphones
Basic search capabilities
Companies are starting to take archive seriously. About 35 percent of firms surveyed have adopted archive, and another 36 percent plan to in the next 12-24 months. Enterprise archive has grown over 200 percent from 2007 to 2009. Steve agrees that not everything needs to be stored on disk. Retention periods greater than six years dictates the need for tape.
Current systems may not meet today's requirements. Data loss and downtime costs have skyrocketed. Data Protection and Retention projects can represent a gold mine of savings, new capabilities can greatly lower costs, allowing companies to shift resources over to revenue generation.
Big Data, New Physics and Geospatial Super-Food
I would vote this the best session of the day! For all those confused on what the heck "Big Data" means, Jeff has the best explanation. Jeff Jonas is an IBM Distinguished Engineer and the Chief Scientist of Entity Analytics. He had just finished his 17th marathon on Saturday, and his fingers were bandaged.
Jeff had founded the Systems Research and Design (SR&D) company, known for creating NORA (non-obvious relationship awareness) used by Las Vegas casinos to identify fraud. SR&D was acquired by IBM back in 2005. Jeff is focused on sensemaking of streams. He feels many companies are suffering from "Enterprise Amnesia".
"The data must find the data .. and the relevance must find the user."
-- Jeff Jonas
Jeff's metaphor to Big Data is a jigsaw puzzle without the picture on the outside of the box. To demonstrate his point, he presented a pile of jigsaw puzzle pieces and asked four teenagers to put the puzzle together without the advantage of the picture on the box. What he had not told them was that he mixed four different puzzles together, removing out 10 to 20 percent of the pieces from each puzzle. He also added some duplicate pieces from a second identical puzzle, and just to make things fun, included a dozen pieces from a sixth puzzle just to mess with their heads. Within a few hours, the kids had managed to figure out that there were four puzzles, that there were duplicate pieces, and that there were some pieces that did not fit any of the four puzzles.
"You can't squeeze knowledge from a pixel."
-- Jeff Jonas
This approach favors false negatives. New observations reverse out old conceptions. As the picture emerges, this provides added focus on new information. More data can provide better predictions. "Bad" data, including misspelled words and mis-coded categories, was often discarded or corrected on the basis of "Garbage-In, Garbage Out", but can now be useful in a Big Data perspective.
Take for example the 600 billion recordings of the "location data" captured on cell phones every day. With regular triangulation of cell phone towers, the information can pinpoint you within 60 meters, add GPS and this improved to within 20 meters, and add Wi-Fi is further improved to 10 meters. While this data is "de-identified" so as not to identify individual users, the process of re-identification is relatively trivial. Jeff's system is able to predict a person will be next Thursday at 5:35pm with 87 percent accuracy.
Thus, Big Data represents an asset, accumulation of context. Real-time analytics can be a competitive advantage. These streams of data will need persistent storage and massive I/O capabilities. In one example, Jeff processed 4,200 separate sources of information and was able to identify "dead votes". These are votes cast by people that died in years prior, indicating voter fraud.
Jeff's latest project, codenamed G2, will tackle not just people, but everything from proteins to asteroids.
Normally, the worst time slot is the hour after lunch, but these presentations kept people's attention.
This year marks the 10 year anniversary of IBM's introduction of LTO tape technology. IBM is a member of the Linear Tape Open consortium which consists of IBM, HP and Quantum, referred to as "Technology Provider Companies" or TPCs. In an earlier job role, I was the "portfolio manager" for both LTO and Enterprise tape product lines.
Today, we held a celebration in Tucson, with cake and refreshments.
IBM Executives Doug Balog, IBM VP of Storage Platform, and Sanjay Tripathi, the new IBM Director and Business Line Executive for Tape, VTL and Archive systems, presented the successes of LTO tape over the past 10 years.
To date over 3.5 million LTO tape drives, and over 150 million LTO tape media cartridges have been shipped which is a testament to the remarkable marketplace acceptance of the technology.
In honor of this event, I decided to interview Bruce Master, IBM Senior Program Manager for Data Protection Systems, about this 10 year anniversary.
10 years of LTO technology is a great milestone. How is this especially significant to IBM and its clients?
According to IDC data, IBM has held the #1 leader position in market share for total world wide branded tape revenue for over 7 years and that IBM is still #1 in branded midrange tape revenue which includes the LTO tape technologies. IBM was the first drive manufacturer to deliver LTO-1 drives, back in September 2000, the first to deliver tape drive encryption to the marketplace on LTO-4 drives, and is shipping LTO generation 5 drives and libraries. IBM is the author of the new Linear Tape File System (LTFS) specification that has been adopted by the TPCs. This file system revolutionizes how tape can be used as if it were a giant 1.5 terabyte removable USB memory stick with the capability to be accessed with directory tree structures and drag and drop functionality. With LTO's built-in real-time compression, a single tape cartridge can hold up to 3TB of data.
The Linear Tape File System has been getting a lot of attention. Where can we learn more about it?
Why is tape still a critical part of a storage infrastructure?
Tape is low cost and provides critical off-line portable storage to help protect data from attacks that can occur with on-line data. For instance, on-line data is at risk of attack from a virus, hacker, system error, disgruntled employee, and more. Since tape is off-line, not accessible by the system, it protects against these forms of corruption. LTO technology also provides write-once read-many (WORM) tape media to help address compliance issues that specify non-erasable, non-rewriteable (NENR) storage, hardware encryption to secure data, as well as a low cost long term archive media. When data cools off, or becomes infrequently accessed, why keep it on spinning disk? Move it to tape where it is much greener and lower cost. A tape in a slot on a shelf consumes minimal energy.
So tape is not dead?
Ha! Far from it. Seems like disk-only "specialty shop" storage vendors that don’t have tape in their sales portfolio are the ones that propagate that myth. In reality, storage managers are tasked with meeting complex objectives for performance, compliance, security, data protection, archive and total cost of ownership. Optimally, a blend of disk and tape in a tiered infrastructure can best address these objectives. You can’t build a house with just a hammer. IBM has a rich tool kit of storage offerings including disk, tape, software, services and deduplication technologies to help clients address their needs.
Do you have an example of a client who was saved by tape?
Yes indeed. Estes Express, a large trucking firm, was hit by a hurricane that flooded their data center and destroyed all systems. Fortunately the company survived because the night before they had backed up all data on to IBM tape and moved the cartridges offsite! The company survived and has since implemented a best practices data protection strategy with a combination of disk-to-disk-to-tape (D2D2T) using LTO tape at the primary site, and a remote global mirrored site that is also backed up to LTO tape.
So tape saved the day. What is the outlook for tape innovation in the future?
The future is bright for tape. Earlier this year, IBM and Fujifilm were able to [demonstrate a tape density achievement] that could enable a native 35TB tape cartridge capacity! This shows a long roadmap ahead for tape and a continued good night’s sleep for storage managers knowing that their precious data will be safe.
Of course, LTO tape is just one of the many reasons IBM is a successful and profitable leader in the IT storage industry. Doug Balog talked about his experiences in London for the [October 7th launch] of IBM DS8800, Storwize V7000 and SAN Volume Controller 6.1. Sanjay Tripathi showed recent successes with IBM's ProtecTIER Data Deduplication Solution and Information Archive products.
I would like to thank Bruce Master for his time in completing this interview. To learn more about IBM tape and storage offerings, visit [ibm.com/storage].
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], here is my quick recap of the keynote sessions presented Monday morning. Marlin Maddy, Worldwide Technical Events Executive for IBM Systems Lab Services and Training, served as emcee.
Roland Hagan, IBM Vice President for IBM System x server platform, presented on how IBM is redefining the x86 computing experience. More than 50 percent of all servers are x86 based. These x86 servers are easy to acquire, enjoy a large application base, and can take advantage of readily available skilled workforce for administration. The problem is that 85 percent of x86 processing power remains idle, energy costs are 8 times what they were 12 years ago, and management costs are now 70 percent of the IT budget.
IBM has the number one market share for scalable x86 servers. Roland covered the newly announced eX5 architecture that has been deployed in both rack-optimized models as well as IBM BladeCenter blade servers. These can offer 2x the memory capacity as competitive offerings, which is important for today's server virtualization, database and analytics workloads. This includes 40 and 80 DIMM models of blades, and 64 to 96 DIMM models of rack-optimized systems. IBM also announced eXFlash, internal Solid State Drives accessible at bus speeds. FlexNode allows a 4-node system to dynamically change to 2 separate 2-node systems.
By 2013, analysts estimate that 69 percent of x86 workloads will be virtualized, and that 22 percent of servers will be running some form of hypervisor software. By 2015, this grows to 78 percent of x86 workloads being virtualized, and 29 percent of servers running hypervisor.
Doug Balog, IBM Vice President and Disk Storage Business Line Executive, presented how the growth of information results in a "perfect storom" for the storage industry. Storage Admins are focused on managing storage growth and the related costs and complexity, proper forecasting and capacity planning, and backup administration. IBM's strategy is to help clients in the following areas:
Storage Efficiency - getting the most use out of the resources you invest
Service Delivery - ensuring that information gets to the right people at the right time, simplify reporting and provisioning
Data Protection - protecting data against unethical tampering, unauthorized access, and unexpected loss and corruption
He wrapped up his talk covering the success of DS8700 and XIV. In fact, 60 percent of XIV sales are to EMC customers. The TCO of an XIV is less than half the TCO of a comparable EMC VMAX disk system.
Dave McQueeney, IBM Vice President for Strategy and CTO for US Federal, covered how IBM's Smarter Planet vision for smarter cities, smarter healthcare, smarter energy grid and smarter traffic are being adopted by the public sector. Almost every data center in US Federal government is out of power, floor space and/or cooling capability. An estimated 80 percent of US Federal government IT budgets are spent on maintenance and ongoing operations, leaving very little left over for the big transformational projects that President Barack Obama wants to accomplish.
Who has the most active Online Transaction Processing (OLTP)? You might guess a big bank, but it is the US Department of Homeland Security (DHS), with a system processing 600 million transactions per day. Another government agency is #2, and the top Banking application is finally #3. The IBM mainframe has solved problems 10 to 15 years ago that the distributed systems are just now encountering today. Worldwide, more than 80 percent of banks use mainframes to handle their financial transactions.
IBM's recent POWER7 set of servers are proving successful in the field. For example, Allianz was able to consolidate 60 servers to 1. Running DB2 on POWER7 server is 38 percent less expensive than Oracle on x86 Nehalem processors. For Java, running JVM on POWER7 is 73 percent better than JVM on x86 Nehalem.
The US federal government ingests a large amount of data. It has huge 10-20 PB data warehouses. In fact, the amount of GB received every year by the US federal government alone exceed the production of all disk drives produced by all drive manufacturers. This means that all data must be processed through "data reduction" or it is gone forever.
The last keynote for Monday was given by Clod Barrera, IBM Distinguished Engineer and Chief Technical Strategist for System Storage. He started out shocking the audience with his view that the "disk drive industry is a train wreck". While R&D in disk drives enjoyed a healthy improvement curve up to about 2004, it has now slowed down, getting more difficult and more expensive to improve performance and capacity of disk drives. The rest of his presentation was organized around three themes:
Integrated Stacks - while new-comers like Oralce/Sun and the VCE coalition are promoting the benefits of integrated stacks, IBM has been doing this for the past five decades. New advancements in Server and Storage virtualization provide exciting new opportunities.
Integrated Systems - solutions like IBM Information Archive and SONAS, and new features like Easy Tier that help adopt SSD transparently. As it gets harder and harder to scale-up, IBM has moved to innovative scale-out architectures.
Integrated Data Center management - companies are now realizing that management and governance are critical factors of success, and that this needs to be integrated between traditional IT, private, public and hybrid cloud computing.
This was a great inspiring start for what looks like an awesome week!