Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
Those that prefer to work with one-stop shopping of an IT Supermarket, with companies like IBM, HP and Dell who offer a complete set of servers, storage, switches, software and services, what we call "The Five S's".
Those that perfer shopping for components at individual specialty shops, like butchers, bakers, and candlestick makers, hoping that this singular focus means the products are best-of-breed in the market. Companies like HDS for disk, Quantum for tape, and Symantec for software come to mind.
My how the IT landscape for vendors has evolved in just the past five years! Cisco starts to sell servers, and enters a "mini-mall" alliance with EMC and VMware to offer vBlock integrated stack of server, storage and switches with VMware as the software hypervisor. For those not familiar with the concept of mini-malls, these are typically rows of specialty shops. A shopper can park their car once, and do all their shopping from the various shops in the mini-mall. Not quite "one-stop" shopping of a supermarket, but tries to address the same need.
("Who do I call when it breaks?" -- The three companies formed a puppet company, the Virtual Computing Environment company, or VCE, to help answer that question!)
Among the many things IBM has learned in its 100+ years of experience, it is that clients want choices. Cisco figured this out also, and partnered with NetApp to offer the aptly-named FlexPod reference architecture. In effect, Cisco has two boyfriends, when she is with EMC, it is called a Vblock, and when she is with NetApp, it is called a FlexPod. I was lucky enough to find this graphic to help explain the three-way love triangle.
Did this move put a strain on the relationship between Cisco and EMC? Last month, EMC announced VSPEX, a FlexPod-like approach that provides a choice of servers, and some leeway for resellers to make choices to fit client needs better. Why limit yourself to Cisco servers, when IBM and HP servers are better? Is this an admission that Vblock has failed, and that VSPEX is the new way of doing things? No, I suspect it is just EMC's way to strike back at both Cisco and NetApp in what many are calling the "Stack Wars". (See [The Stack Wars have Begun!], [What is the Enterprise Stack?], or [The Fight for the Fully Virtualized Data Center] for more on this.)
(FTC Disclosure: I am both an employee and shareholder of IBM, so the U.S. Federal Trade Commission may consider this post a paid, celebrity endorsement of the IBM PureFlex system. IBM has working relationships with Cisco, NetApp, and Quantum. I was not paid to mention, nor have I any financial interest in, any of the other companies mentioned in this blog post. )
Last month, IBM announced its new PureSystems family, ushering in a [new era in computing]. I invite you all to check out the many "Paterns of Expertise" available at the [IBM PureSystems Centre]. This is like an "app store" for the data center, and what I feel truly differentiates IBM's offerings from the rest.
The trend is obvious. Clients who previously purchased from specialty shops are discovering the cost and complexity of building workable systems from piece-parts from separate vendors has proven expensive and challenging. IBM PureFlex™ systems eliminate a lot of the complexity and effort, but still offer plenty of flexibility, choice of server processor types, choice of server and storage hypervisors, and choice of various operating systems.
Every January, we look back into the past as well as look into the future for trends to watch for the upcoming year. Ray Lucchesi of Silverton Consulting has a great post looking back at the [Top 10 storage technologies over the last decade]. I am glad to see that IBM has been involved with and instrumental in all ten technologies.
Looking into the future, Mark Cox of eChannel has an article [Storage Trends to Watch in 2011], based on his interviews with two fellow IBM executives: Steve Wojtowecz, VP of storage software development, and Clod Barrera, distinguished engineer and CTO for storage. Let's review the four key trends:
Cloud Storage and Cloud Computing
No question: Cloud Computing will be the battleground of the IT industry this decade. I am amused by the latest spate of Microsoft commercials where problems are solved with someone saying "...to the cloud". Riding on the coat tails of this is "Cloud Storage", the ability to store data across an Internet Protocol (IP) network, such as 10GbE Ethernet, in support of Cloud Computing applications. Cloud Storage protocols in the running include NFS, CIFS, iSCSI and FCoE.
Mark writes "..vendors who aren't investing in cloud storage solutions will fall behind the curve."
Economic Downturn forces Innovation
The old British adage applies: "Necessity is the mother of invention." The status quo won't do. In these difficult economic times, IT departments are running on constrained budgets and staff. This forces people to evaluate innovative technologies for storage efficiency like real-time compression and data deduplication to make better use of what they currently have. It also is forcing people to take a "good enough" attitude, instead of paying premium prices for best-of-breed they don't really need and can't really afford.
IT Service Management
Companies are getting away from managing individual pieces of IT kit, and are focusing instead on the delivery of information, from the magnetic surface of disk and tape media, to the eyes and ears of the end users. The deployment mix of private, hybrid and public clouds makes this even more important to measure and manage IT as a set of services that are delivered to the business. IT Service Management software can be the glue, helping companies implement ITIL v3 best practices and management disciplines.
Smarter Data Placement
A recent survey by "The Info Pro" analysts indicates that "managing storage growth" is considered more critical than "managing storage costs" or "managing storage complexity".
This tells me that companies are willing to spend a bit extra to deploy a tiered information infrastructure if it will help them manage storage growth, which typically ranges around 40 to 60 percent per year. While I have discussed the concept of "Information Lifecycle Management" (ILM), for the past four years on this blog, I am glad to see it has gone mainstream, helped in part with automated storage tiering features like IBM System Storage Easy Tier feature on the IBM DS8000, SAN Volume Controller and Storwize V7000 disk systems. Not all data is created equal, so the smart placement of data, based on the business value of the information contained, makes a lot of sense.
These trends are influencing what solutions the various different vendors will offer, and will influence what companies purchase and deploy.
Last week, in Computer Technology Review's article [Tiering: Scale Up? Scale Out? Do Both], Mark Ferelli interviews fellow blogger Hu Yoshida, CTO of Hitachi Data Systems (HDS). Here's an excerpt:
"MF/CTR: A global cache should be required to implement that common pool that you’re talking about going across all tiers.
Hu/HDS: Right. So that is needed to get to all the resources. Now with our system, we can also attach external storage behind it for capacity so that as the storage ages out or becomes less active we can move it to the external storage. They would certainly have less performance capability, but you don’t need it for the stale data that we’re aging down. Right now we’re the only vendor that can provide this type of tiering.
If you look at other people who do virtualization like IBM’s SVC, the SVC has no storage within it because it’s sitting so if you attach any storage behind it, there is some performance degradation because you have this appliance sitting in front. That appliance is also very limited in cache and very limited in the number of storage boards on it. It cannot really provide you additional performance than what is attached behind it. And in fact, it will always degrade what is attached behind it because it’s not storage, where as our USP is storage and it has a global cache and it has thousands of port connections, load balancing and all that. So our front end can enhance existing storage that sits behind it."
This is not the first time I have had to correct Hu and others of misperceptions of IBM's SAN Volume Controller (SVC). This month marks my four year "blogoversary", and I seem to spend a large portion of my blogging time setting the record straight. Here are just a few of my favorite posts setting the record straight on SVC back in 2007:
Since day 1, SAN Volume Controllers has focused primarily on external storage. Initially, the early models had just battery-protected DRAM cache memory, but the most recent model of the SVC, the 2145-CF8, adds support for internal SLC NAND flash solid state drives. To fully appreciate how SVC can help improve the performance of the disks that are managed, I need to use some visual aids.
In this first chart, we look at a 70/30/50 workload. This indicates that 70 percent of the IOPS are reads, 30 percent writes, and 50 percent can be satisfied as cache hits directly from the SVC. For the reads, this means that 50 percent are read-hits satisfied from SVC DRAM cache, and 50 percent are read-miss that have to get the data from the managed disk, either from the managed disk's own cache, or from the actual spinning drives inside that managed disk array.
For writes, all writes are cache-hits, but some of them will be destaged to the managed disk. Typically, we find that a third of writes are over-written before this happens, so only two-thirds are written down to managed disk.
In this example, the SVC reduced the burden of the managed disk from 100,000 IOPS down to 55,000, which is 35,000 reads and 20,000 writes. Some have argued against putting one level of cache (SVC) in front of another level of cache (managed disk arrays). However, CPU processor designers have long recognized the value of hierarchical cache with L1, L2, L3 and sometimes even L4 caches. The cache-hits on SVC are faster than most disk system's cache-hits.
This is a Ponder curve, mapping millisecond response (MSR) times for different levels of I/O per second, named after the IBM scientist John Ponder that created them. Most disk array vendors will publish similar curves for each of their products. In this case, we see that 100,000 IOPS would cause a 25 millisecond response (MSR) time, but when the load is reduced to 55,000 IOPS, the average response time drops to only 7 msec.
To be fair, the SVC does introduce 0.06 msec of additional latency on read-misses, so let's call this 7.06 msec. This tiny amount of latency could be what Hu Yoshida was referring to when he said there was "some performance degradation". There are other storage virtualization products in the market that do not provide caching to boost performance, but rather just map incoming requests to outgoing requests, and these can indeed slow down every I/O they process. Perhaps Hu was thinking of those instead of IBM's SVC when he made his comments.
Of course, not all workloads are 70/30/50, and not every disk array is driven to its maximum capability, so your mileage may vary. As we slide down the left of the curve where things are flatter, the improvement in performance lowers.
IOPS before SVC
IOPS after SVC
MSR before SVC
MSR after SVC
Hitachi's offerings, including the HDS USP-V, USP-VM and their recently announced Virtual Storage Platform (VSP) sold also by HP under the name P9500, have similar architecture to the SVC and can offer similar benefits, but oddly the Hitachi engineers have decided to treat externally attached storage as second-class citizens instead. Hu mentions data that "ages out or becomes less active we can move it to the external storage." IBM has chosen not to impose this "caste" system onto its design of the SAN Volume Controller.
The SVC has been around since 2003, before the USP-V came to market, and has sold over 20,000 SVC nodes over the past seven years. The SVC can indeed improve performance of managed disk systems, in some cases by a substantial amount. The 0.06 msec latency on read-miss requests represents less than 1 percent of total performance in production workloads. SVC nearly always improves performance, and in the worst case, provides same performance but with added functionality and flexibility. For the most part, the performance boost comes as a delightful surprise to most people who start using the SVC.
To learn more about IBM's upcoming products and how IBM will lead in storage this decade, register for next week's webcast "Taming the Information Explosion with IBM Storage" featuring Dan Galvan, IBM Vice President, and Steve Duplessie, Senior Analyst and Founder of Enterprise Storage Group (ESG).
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
Private Cloud Computing at Bank of America – One Year Later
Prentice Dees, Senior VP for Systems Automation Engineering at Bank of America, did the happy dance celebrating their success implementing a private cloud. Bank of America merged with Merrill Lynch, has 29 million users residing in over 100 countries, and 5900 retail offices in 40 countries. They manage $1 billion US dollars in deposits, and $2.2 trillion in assets.
Rather than IaaS or PaaS, his team focused on Application-as-a-Service (AaaS). Their goal is to transform and move IT out of the way of the business. In his view, if a human has to touch a keyboard, then his team has failed.
He divides the work up into three layers:
Bones: These are the physical components, such as servers, storage, switches that provide capacity and interconnect.
Muscle: This is the translation layer, providing actions and reporting.
Brains: This is the layer for intelligent automation
Provisioning new servers with storage involves three sets of steps. The first set of steps involves requesting approval. The second set of steps deploys the server. The third involves installing the application, loading the data and using it until End-of-Life. The second set of steps took 14 to 60 days before, and has been automated down to one to three hours.
The results is that he has improved server utilization 10x, and storage is over-provisioned 4x, and are now hosting over 11,000 server images, saving $20 million US dollars. Not only is this lower cost per application deployed, but the process allows for lower-skilled personnel. He has over 500TB of virtual storage deployed, using thin provisioning, with only 128TB of physical disk. But they have only scratched the surface. Only 15 to 20 percent are virtualized in this manner, and they want to get to 80 percent within the next three years.
What makes an application not "Cloud-ready"? Prentice is a big fan of Linux and Open Source solutions. Some applications consume the entire server. In other cases, code changes are required. If possible, try to split up large applications into smaller Cloud-ready chunks?
How many people on his team? There are currently 16 to 20 people on the team, but at its peak there were 30 people.
Rather than wasting time on capacity planning, his team focuses on a cost recovery model instead. Seed capital in combination with rock-solid recovery is the way to go. "All models are wrong," the saying goes, "but some are useful!"
A nice side benefit to this new approach is maintenance is greatly improved. Rather than rushing to fix problems, you roll the application over to another host machine, and then take your time fixing the failed hardware.
How does the team deal with requests for dedicated resources? Give them the keys to their own miniature private cloud. Let them provision from their dedicated resources using the same methods you use to provision everyone else. This allows them to get comfortable with the process, and eventually join the rest of the shared pool. Analytics can be used to find "rogue VMs" that don't play well with others.
Their automation is a mix of commercial and open source software, with home-grown scripts. They have one "Orchestration Management Data Base" (OMDB) to manage multiple disparate Configuration Management Data bases (CMDBs). The chargeback is not quite per individual pay-per-use, but more at the departmental level.
Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
45 percent reduction in energy costs for Ethernet switching gear
80 percent fewer cables
15 percent lower costs
doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward.
Happy Winter Solstice everyone! The Mayan calendar flipped over yesterday, and everything continued as normal.
The next date to watch out for is ... drumroll please ... April 8, 2014. This is the date Microsoft has decided to [drop support for Windows XP].
While many large corporations are actively planning to get off Windows XP, there are still many homes and individuals that are running on this platform.
When [Windows XP] was introduced in 2001, it could support systems with as little as 64MB of RAM. Nowadays, the latest versions of Windows now requires a minimum of 1GB for 32-bit systems, with 2GB or 3GB recommended.
That leaves Windows XP users on older hardware few choices:
Continue to run Windows XP, but without support (and hope for the best)
Upgrade their hardware with more RAM (and possibly more disk space) needed to run a newer level of Windows
Install a different operating system like Linux
Put the hardware in the recycle bin, and buy a new computer
Here is a personal example. A long time ago, I gave my sister a Thinkpad R31 laptop so that she could work from home. When she got a newer one, she passed this down to her daughter for doing homework. When my neice got a newer one, she passed this old laptop to her grandma.
Grandma is fairly happy with her modern PC running Windows XP. She plays all kinds of games, scans photographs, sends emails, listens to music on iTunes, and even uses Skype to talk to relatives. Her problem is that this PC is located upstairs, in her bedroom, and she wanted something portable that she could play music downstairs when she is playing cards with her friends.
"Why not use the laptop you have?" I asked. Her response: "It runs very slow. Perhaps it has a virus. Can you fix that?" I was up for the challenge, so I agreed.
(The Challenge: Update the Thinkpad R31 so that grandma can simply turn it on, launch iTunes or similar application, and just press a "play" button to listen to her music. It will be plugged in to an electrical outlet wherever she takes it, and she already has her collection of MP3 music files. My hope is to have something that is (a) simple to use, (b) starts up quickly, and (c) will not require a lot of on-going maintenance issues.)
Here are the relevant specifications of the Thinkpad R31 laptop:
The system was pre-installed with Windows XP, but was terribly down-level. I updated to Windows XP SP3 level, downloaded the latest anti-virus signatures, and installed iTunes. A full scan found no viruses. All this software takes up 14GB, leaving less than 6GB for MP3 music files.
The time it took from hitting the "Power-on" button to hearing the first note of music was over 14 minutes! Unacceptable!
If you can suggest what my next steps should be, please comment below or send me an email!
Am I dreaming? On his Storagezilla blog, fellow blogger Mark Twomey (EMC) brags about EMC's standard benchmark results, in his post titled [Love Life. Love CIFS.]. Here is my take:
A Full 180 degree reversal
For the past several years, EMC bloggers have argued, both in comments on this blog, and on their own blogs, that standard benchmarks are useless and should not be used to influence purchase decisions. While we all agree that "your mileage may vary", I find standard benchmarks are useful as part of an overall approach in comparing and selecting which vendors to work with, and which architectures or solution approaches to adopt, and which products or services to deploy. I am glad to see that EMC has finally joined the rest of the planet on this. I find it funny this reversal sounds a lot like their reversal from "Tape is Dead" to "What? We never said tape was dead!"
Impressive CIFS Results
The Standard Performance Evaluation Corporation (SPEC) has developed a series of NFS benchmarks, the latest, [SPECsfs2008] added support for CIFS. So, on the CIFS side, EMC's benchmarks compare favorably against previous CIFS tests from other vendors.
On the NFS side, however, EMC is still behind Avere, BlueArc, Exanet, and IBM/NetApp. For example, EMC's combination of Celerra gateways in front of V-Max disk systems resulted in 110,621 OPS with overall response time of 2.32 milliseconds. By comparison, the IBM N series N7900 (tested by NetApp under their own brand, FAS6080) was able to do 120,011 OPS with 1.95 msec response time.
Even though Sun invented the NFS protocol in the early 1980s, they take an EMC-like approach against standard benchmarks to measure it. Last year, fellow blogger Bryan Cantrill (Sun) gives his [Eulogy for a Benchmark]. I was going to make points about this, but fellow blogger Mike Eisler (NetApp) [already took care of it]. We can all learn from this. Companies that don't believe in standard benchmarks can either reverse course (as EMC has done), or continue their downhill decline until they are acquired by someone else.
(My condolences to those at Sun getting laid off. Those of you who hire on with IBM can get re-united with your former StorageTek buddies! Back then, StorageTek people left Sun in droves, knowing that Sun didn't understand the mainframe tape marketplace that StorageTek focused on. Likewise, many question how well Oracle will understand Sun's hardware business in servers and storage.)
What's in a Protocol?
Both CIFS and NFS have been around for decades, and comparisons can sometimes sound like religious debates. Traditionally, CIFS was used to share files between Windows systems, and NFS for Linux and UNIX platforms. However, Windows can also handle NFS, while Linux and UNIX systems can use CIFS. If you are using a recent level of VMware, you can use either NFS or CIFS as an alternative to Fibre Channel SAN to store your external disk VMDK files.
The Bigger Picture
There is a significant shift going on from traditional database repositories to unstructured file content. Today, as much as [80 percent of data is unstructured]. Shipments this year are expected to grow 60 percent for file-based storage, and only 15 percent for block-based storage. With the focus on private and public clouds, NAS solutions will be the battleground for 2010.
So, I am glad to see EMC starting to cite standard benchmarks. Hopefully, SPC-1 and SPC-2 benchmarks are forthcoming?
Well it's Wednesday, and you know what that means... IBM Announcements.
(Normally, announcements are on Tuesdays, but we moved this one over to Wednesday to line up with our big launch event in Pinehurst, NC. )
A lot was announced today, so I decided to break it up into several separate posts. I will start with our Enterprise Systems: DS8870, TS7700 Release 3, and XIV Gen3.
Enterprise systems are the servers, storage and software at the core of an enterprise IT infrastructure. Enterprise systems enable a private cloud infrastructure at enterprise scale, with flexible service delivery models that provide dynamic efficiency for resource and workload management. They make sure critical data is always available across the enterprise, making it accessible in new ways so that actionable insights can be derived from advanced and operational analytics. They also provide ultimate security, ensuring the integrity of critical data while mitigating risk and providing assured compliance.
IBM System Storage DS8870® disk system
This new storage system is the next generation in IBM's DS8000 series, based on IBM's POWER7 chipset. Each CEC can have 2, 4, 8 or 16 cores. Like the DS8800, you can have a mix of 2.5-inch and 3.5-inch disk drives of different speeds and capacities, up to 1,536 drives in a four-frame configuration. The maximum cache is now 1TB usable. The combination of faster chipset and more cache can triple performance for some workloads!
All DS8870s ship standard with all Full Disk Encryption (FDE-capable) drives. The problem in the past was that people would buy DS8000 with non-FDE drives, and then later want to activate encryption, and discovered that they have to swap out their drives with those with the encryption chip built in. Now, all drives on the DS8870 will have the encryption chip. This also allows Easy Tier sub-volume automated tiering to move encrypted data between all media types.
Flash optimization with DS8000 Easy Tier can improve performance up to 3 times with 3% of data on solid-state storage. Easy Tier is easy to deploy and runs automatically.
Support of the American National Standards Institute's (ANSI) T10 Data Integrity Field (DIF) standard. This is a feature that the mainframe has had for years, and is now being extended to distributed operating systems. The concept is simple. When sending data between server and storage, generate a checksum at the source, and then validate the checksum at the target. When you write a block of data, the server generates the checksum, and the DS8870 validates the checksum on arrival. When you read the data back, the DS8870 generates the checksum, and the server validates it on arrival. This ensures that data was not corrupted in between. There is a great write-up on IBM developerWorks: [End-to-end data protection using T10 standard data integrity field].
Energy Efficient. The DS8870 consumes less energy than its predecessor, the DS8800. For example, a fully-configured four-frame DS8870 with 1,536 disk drives consumes only 23.2kW, compared to the same number of drives in a DS8800 consumed 26.3 kW. By comparison, the DS8700 with five frames and 1,024 drives consumed 29.2kW.
Support for new System z load balancing algorithm. System z Workload Manager now interacts with the DS8870 I/O Priority Manager to optimize designated Quality of Service (QoS) levels. We have also the fastest operational analytics solution with DB2 list Prefetch cache optimization with DS8870 High Performance FICON (zHPF) integration. This solution increases DB2 query performance up to 11 times with disk, and up to 60 times with solid-state drives (SSD). File scans are up to 30 percent faster using DS8870 zHPF support for sequential access methods (QSAM, BPAM, and BSAM).
VMware vStorage APIs for Array Integration (VAAI) support. Why should the IBM DS8800 series support VMware when IBM already offers great VMware support with SAN Volume Controller (SVC), Storwize V7000 and XIV storage sytsems? Good question. This was hotly debated between development and marketing. Several DS8000 customers have already added SVC to provide full VMware VAAI support. As a consultant, I am neither development nor marketing, but felt it necessary to weigh in on my opinion on this. The DS8000 is a consolidation platform. According to one analyst survey, 22 percent of companies run on a single disk platform, so for DS8000 to be the one, it needs to support VMware and exploit these special APIs.
Six Nines Availability. Critical enterprise systems need to deliver continuous data availability, or very close to it. IBM solutions can help deliver up to six “nines” of availability, or 99.9999 percent when combining DS8000 Metro Mirror and GDPS Hyperswap. That's less than 30 seconds of downtime per year.
The TS7700 Release 3 represents a refresh to our existing virtual tape libraries. These are mainframe-only, offered in two models: TS7720 is a disk-only device, and the TS7740 is a blended disk-and-tape solution.
Industry standard hardware encryption. This applies to user data stored on the TS7700 system cache (disk), and for data transferred between TS7700 systems. This is especially important for regulations, like Payment Card Industry Data Security Standard (PCI-DSS). In previous models, the data would not be encrypted until it was moved off disk and written to tape. Now, it is encrypted the minute in lands on the disk cache, and stays encrypted as it is replicated from one TS7700 to another in the grid.
Up to 4 Million logical volume capacity. This is twice the previous support.
More physical capacity for TS7720 systems. The maximum capacity for the disk-only model is raised from 440TB to 620TB, representing a 40 percent increase.
My latest book "Inside System Storage: Volume V" is now available!
I have published my fifth volume in my "Inside System Storage" series! Currently, it is only available in Paperback. My editor, Susan Pollard, is hoping to have the eBook and Hardcover versions ready for Cyber Monday. The foreword was written by my Dr. Sondra Ashmore.
You can order this, and all my other books, in all formats, directly from my [Author Spotlight] page. The paperback will also be available soon from other online booksellers, search for ISBN 978-1-300-26223-7.
Improved Scalability. A new Multi-system Manager (MSM) server reduces the operational complexity for large and multi-site XIV deployments. Previously, admins connected directly to XIV boxes. If you had 10 admins logged in, then every XIV box was managing 10 admin conversations. The new MSM acts as a go-between. The admins connect to the MSM, and the MSM connects to the XIV boxes. The MSM polls and caches the status of each XIV, greatly increasing the number of XIV boxes that an admin can manage.
Enhanced User Interface. A new Multi-system Manager server reduces the operational complexity for large and multi-site XIV deployments. We also added support for IPsec and US. Government (USGv6) certification for admistering the XIV over IPv6 networks. The XIV Mobile Dashboard app for iPhone and iPad is spiffed up. Finally, the GUI has been internationalized and translated to the Japanese language.
Enhanced Integration for Cloud. For OpenStack, XIV now offers a Nova-volume driver which provides persistent storage to OpenStack compute nodes. The Nova task force is now looking to move storage into its own project called Cinder. For VMware, XIV has full support for Site Recovery Manager (SRM) v4.1 and v5.0 releases. XIV now also supports the Microsoft System Center Virtual Machine Manager, which can manage Hyper-V, VMware and Citrix XenServer hypervisors.
Smaller entry point. The original XIV supported 1TB and 2TB drives, with the smallest offering being 27TB usable. When IBM introduced the XIV Gen3, the two choices were 2TB and 3TB disk drives. Unfortunately, this meant that the initial entry model was now 55TB in size, and each additional module would be more expensive as well. IBM is now going to offer 1TB support for XIV Gen3 for a lower price point, these are actually 2TB drives with half the capacity turned off.
This year marks the 10 year anniversary of IBM's introduction of LTO tape technology. IBM is a member of the Linear Tape Open consortium which consists of IBM, HP and Quantum, referred to as "Technology Provider Companies" or TPCs. In an earlier job role, I was the "portfolio manager" for both LTO and Enterprise tape product lines.
Today, we held a celebration in Tucson, with cake and refreshments.
IBM Executives Doug Balog, IBM VP of Storage Platform, and Sanjay Tripathi, the new IBM Director and Business Line Executive for Tape, VTL and Archive systems, presented the successes of LTO tape over the past 10 years.
To date over 3.5 million LTO tape drives, and over 150 million LTO tape media cartridges have been shipped which is a testament to the remarkable marketplace acceptance of the technology.
In honor of this event, I decided to interview Bruce Master, IBM Senior Program Manager for Data Protection Systems, about this 10 year anniversary.
10 years of LTO technology is a great milestone. How is this especially significant to IBM and its clients?
According to IDC data, IBM has held the #1 leader position in market share for total world wide branded tape revenue for over 7 years and that IBM is still #1 in branded midrange tape revenue which includes the LTO tape technologies. IBM was the first drive manufacturer to deliver LTO-1 drives, back in September 2000, the first to deliver tape drive encryption to the marketplace on LTO-4 drives, and is shipping LTO generation 5 drives and libraries. IBM is the author of the new Linear Tape File System (LTFS) specification that has been adopted by the TPCs. This file system revolutionizes how tape can be used as if it were a giant 1.5 terabyte removable USB memory stick with the capability to be accessed with directory tree structures and drag and drop functionality. With LTO's built-in real-time compression, a single tape cartridge can hold up to 3TB of data.
The Linear Tape File System has been getting a lot of attention. Where can we learn more about it?
Why is tape still a critical part of a storage infrastructure?
Tape is low cost and provides critical off-line portable storage to help protect data from attacks that can occur with on-line data. For instance, on-line data is at risk of attack from a virus, hacker, system error, disgruntled employee, and more. Since tape is off-line, not accessible by the system, it protects against these forms of corruption. LTO technology also provides write-once read-many (WORM) tape media to help address compliance issues that specify non-erasable, non-rewriteable (NENR) storage, hardware encryption to secure data, as well as a low cost long term archive media. When data cools off, or becomes infrequently accessed, why keep it on spinning disk? Move it to tape where it is much greener and lower cost. A tape in a slot on a shelf consumes minimal energy.
So tape is not dead?
Ha! Far from it. Seems like disk-only "specialty shop" storage vendors that don’t have tape in their sales portfolio are the ones that propagate that myth. In reality, storage managers are tasked with meeting complex objectives for performance, compliance, security, data protection, archive and total cost of ownership. Optimally, a blend of disk and tape in a tiered infrastructure can best address these objectives. You can’t build a house with just a hammer. IBM has a rich tool kit of storage offerings including disk, tape, software, services and deduplication technologies to help clients address their needs.
Do you have an example of a client who was saved by tape?
Yes indeed. Estes Express, a large trucking firm, was hit by a hurricane that flooded their data center and destroyed all systems. Fortunately the company survived because the night before they had backed up all data on to IBM tape and moved the cartridges offsite! The company survived and has since implemented a best practices data protection strategy with a combination of disk-to-disk-to-tape (D2D2T) using LTO tape at the primary site, and a remote global mirrored site that is also backed up to LTO tape.
So tape saved the day. What is the outlook for tape innovation in the future?
The future is bright for tape. Earlier this year, IBM and Fujifilm were able to [demonstrate a tape density achievement] that could enable a native 35TB tape cartridge capacity! This shows a long roadmap ahead for tape and a continued good night’s sleep for storage managers knowing that their precious data will be safe.
Of course, LTO tape is just one of the many reasons IBM is a successful and profitable leader in the IT storage industry. Doug Balog talked about his experiences in London for the [October 7th launch] of IBM DS8800, Storwize V7000 and SAN Volume Controller 6.1. Sanjay Tripathi showed recent successes with IBM's ProtecTIER Data Deduplication Solution and Information Archive products.
I would like to thank Bruce Master for his time in completing this interview. To learn more about IBM tape and storage offerings, visit [ibm.com/storage].
A lot was announced yesterday, so I decided to break it up into several separate posts. This is part 2 in my 3-part series, focusing on: Storwize V7000 Unified, LTO-6 tape, and the SmartCloud Virtual Storage Center.
The Storwize V7000 Unified is a product that consists of a 2U-high Storwize V7000 control enclosure that provides block-based access, combined with two 2U-high File Modules that provide file-based NAS protocols: CIFS, NFS, HTTPS, SCP and FTP. The problem was that when it was introduced, it was based on Storwize V7000 v6.3, so when the Storwize V7000 v6.4 features were announced last June, they did not apply to the Storwize V7000 Unified.
That is all fixed now, so the Storwize V7000 Unified now supports the full v6.4 features, including Real-time Compression for both file and block-based access to primary data, and Fibre Channel over Ethernet (FCoE) for block access.
The two File Modules are no longer limited to a single Storwize V7000 control enclosure, you can now connect to up to four control enclosures clustered together. Combined with up to nine expansion enclosures for additional disk raises the total maximum to 960 drives.
If you don't already have an Active Directory or LDAP server, the Storwize V7000 Unified now offers an embedded LDAP server, for smaller deployments that want to reduce the number of servers they need to purchase for a complete solution.
Like the [IBM XIV Gen3 storage system], both the Storwize V7000 and V7000 Unified now also support the OpenStack Nova-volume interface.
Lastly, if you have a Storwize V7000 v6.4, you can upgrade it to a Storwize V7000 Unified by simply adding the two File Modules. This can be done in the field.
IBM LTO-6 for tape libraries and drives
IBM introduces the sixth generation of Linear Tape Open (LTO-6) drives, which can be used as stand-alone IBM TS1060 drives, or in IBM tape libraries. As with previous models of LTO, the LTO-6 can read two older generations (LTO-4 and LTO-5) tape media, and can write to previous generation (LTO-5) tape media. You can buy the LTO-6 drives now, and use the older media until LTO-6 tape cartridges are available (hopefully later this year!)
My friend, Brad Johns, from Brad Johns Consulting, has a great post on this [LTO-6 Announcement]. While you expect the new drives to be faster with a denser tape media format, the key advantage to the LTO-6 is that it improves the compression algorithm, from the previous 2:1 to the new 2.5:1 compression ratio:
Thus, with the improved compression, the LTO-6 is 40 percent faster, with double the tape cartridge density. This can reduce backup times by 30 percent, increase the amount of data that sits in your automated tape libraries, and reduce the courier costs sending tapes off-site.
IBM SmartCloud Virtual Storage Center v5.1
Last year, IBM coined the phrase "Storage Hypervisor" to refer to the underlying technology in the IBM SAN Volume Controller (SVC) and Storwize V7000 disk systems.
At the IBM Edge conference last June, my colleague Mike Griese presented [SmartCloud Virtual Storage Center]. Back then, it was a pilot program (beta test), and this week, IBM announces that it will be formally available as a product.
The idea was simple: take the basic storage hypervisor, and add the necessary software to make it a complete solution.
If all of your disk is currently virtualized behind IBM SAN Volume Controller (SVC), or you want to put all of your data behind SVC, then SmartCloud Virtual Storage Center is for you. Basically, for one per-TB price, you get all of the following:
The software features of SAN Volume Controller v6.4, including FlashCopy, Metro Mirror and Global Mirror.
The full advanced features of IBM Tivoli Storage Productivity Center v5.1, including the Storage Analytics Engine that does "Right-Tiering", recommending which LUNs should be moved entirely from one disk system to another, based on policies and access patterns.
IBM Tivoli Storage FlashCopy Manager v3.2 which manages FlashCopy with full coordination with applications, including Microsoft Exchange, SQL Server, DB2, Oracle, SAP, and VMware. This ensures that the FlashCopy destination copies are clean, eliminating the need to run backout or redo logs to correct any incomplete units of work.
If this combination sounds familiar, it was based on IBM's previous attempt called [Rapid Application Storage] which combined the Storwize V7000 with Tivoli Storage Productivity Center Midrange Edition and FlashCopy Manager.
The key difference is that SmartCloud VSC does not include the SVC hardware itself, you buy this separately. If you want Real-time Compression, that is charged separately for the subset of TB of the volumes that you select for compression.
The keynote was led by Phil Tasker, IBM Business Unit Executive (BUE) for STG Education Programs in Growth Markets, then Joe Screnci, head of IBM Storage Sales for Australia. IBM is in the Top 10 Training Hall of Fame, and conducts over 40,000 classes worldwide, resulting in over 1.3 million student days of instructions. IBM Systems Lab and Training technical hosts over three dozen conferences like this one every year.
Next was Clod Barrera, Distinguished Engineer and Chief Technical Strategist for the IBM System Storage product line. He covered future trends in storage as they relate to IBM's Smarter COmputing initiative.
Storage for the Clouds
Clod Barrera presented this break-out session on Cloud Storage. He covered why clouds matter, the various types and purposes of cloud, technology and architectures, and where IBM is headed to support this trend.
Storage for Cloud computing was $1 Billion USD business in 2010, and is expected to grow 32 percent CAGR through, compared to 3.8 percent for non-cloud storage. Clod estimates that 10 to 15 percent of all storage will be in cloud deployments by 2015. Of this storage, analysts expect 50 percent in private clouds, and the other 50 percent in public clouds. For private clouds, clients are looking to "Cloudify" their existing IT infrastructures. For public clouds, the projects are mostly green field.
IBM is also looking to the "arms dealer" of choice for Telcos and other companies looking to launch their own Cloud Services. IBM has a Cloud Services Provider Platform (CSP2) specifically to provide all the tools and technologies needed to make this possible.
Last month, IBM launched several new solutions for Cloud. The IBM Starter Kit for Cloud will help existing IT environments adopt cloud technologies. The IBM Service Agility Accelerator for Cloud is available for more advanced deployments. IBM Service Delivery Manager (ISDM) integrates a collection of software to provide complete integrated service management. IBM CloudBurst provides an integrated hardware-and-software stack for both x86 and POWER chipsets.
Multi-tenancy is also a big issue, and this varies depending on deployment model: IaaS, PaaS, or SaaS. Multi-tenancy is needed to help divide up management tasks, and to ensure that shared resources are paid for and meet SLA requirements accordingly.
Clod feels there are good reasons to use high performance, transactional SAN storage for VMware environments, versus NAS which many people consider simpler to deploy. IBM is also active in open standards, including SNIA's Cloud Data Management Interface [CDMI].
Journey to the Private Cloud
Gary Luke from Brocade provided this session on IBM's SAN384B-2 and SAN768B-2 SAN directors. Brocade is one of IBM's suppliers for SAN switches, and thanks to TRILL being adopted last August by IETF, supports multi-hop FCoE configurations! However, Gary did not talk about FCoE, but rather native FCP and FICON support in these new directors.
According to VMware, only 30 percent of x86 workloads are virtualized by any hypervisor. Gary feels that server virtualization and the use of Solid-State Drives (SSD) in disk arrays are driving existing 8 Gbps SAN to upgrade to 16 Gbps. Gary feels that Fibre-Channel based SANs are best positioned to handle unpredictable peaks in a 24-by-7 world.
The SAN384B-2 can house up to 256 ports (8 Gbps) or 192 ports (16 Gbps) in four slots, 9U chassis. The SAN768B-2 can handle twice these, in a 12U chassis. The nice thing about the 16Gbps ports is that they can auto-negotiate down to 10, 8, 4 and 2 Gbps. This is far better than typical N-2 support, often referred to as the speeds supported, such as 4/2/1 and 8/4/2. An upcoming FOS release will allow people with previous generation SAN384B-1/SAN768B-1 directors to move their 8Gbps blades over to the new SAN384B-2/SAN768B-2 generation models.
Since most CWDM and DWDM only support maximum 10 Gbps FC and 10GbE, Brocade's 16Gbps can automatically drop down to 10 Gbps for direct attachment to CWDM/DWDM, rather than having a step-down box normally required.
A major advancement is the change from copper to optical "Inter-Chassis Links" (ICL). Unlike Inter-switch links (ISL) that use up SAN ports on each box, the ICL is faster, more efficient and does not consume ports. Normally, clients would connect two directors together, but now you can connect up to six chassis together! For example, you can have four SAN368B-2 connected to your host servers, ICL attached to two SAN768B-2, that are then connected to your disk and tape storage devices. The fiber optic ICL allow for up to 50 meters distance. Combining six chassis together would allow the complex to support over 3,000 ports (8 Gbps) or 2,300 ports (16 Gbps).
The SAN384B-2 and SAN768B-2 supports "virtual SAN" logical switches, traffic isoliation (TI) zones, fabric-assigned WWNNs, and fabric-based QoS.
Lastly, Brocade offers a free utility called [SANhealth] that will gather data from your b-type, m-type and even Cisco MDS-based SAN. The data can then be sent to Brocade for analysis, and Brocade will then email back some nice Visio graphs, spreadsheets and other analysis results on the health of your SAN.
We've been quite busy here at the Tucson Executive Briefing Center. I am often asked to explain the relationship between IBM's various storage products. While automakers don't have to explain why they sell sports coupes, pickup trucks and minivans, this analogy does not adequately cover IT storage products. So, I have come up with a new analogy that seems to be a better fit: foundations and flavorings.
All over the world, meals are often comprised of a foundation, perhaps rice, potatoes or pasta, covered with some form of flavoring, sauces, pieces of meat or fish, grated cheese and spices. In Puerto Rico, I had dishes where the foundation was mashed bananas called [plantains]. Sandwich shops often let you pick your choice of bread, the foundation, and then your meats and cheeses, the flavorings.At our local steakhouse,[McMahon's], the menulists a set of steaks, the foundation such as Rib Eye, Filet Mignon, Prime Rib or New York Strip, andvarious flavorings, such as sauces and rubs to cover the steak. Last night, I had the Delmonico steak with the Cristiani sauce consisting of Portobello mushrooms, garlic and aged Romano cheese.
This serves as a useful analogy for IBM's storage strategy. Allowing thefoundations and flavorings to be separately orderable greatly simplifies the selection menu and providesa nearly any-to-any approach to meeting a variety of client needs.Let's take a look at both.
IBM's foundation products are the DS family [DS3000, DS4000, DS5000, DS6000 and DS8000 series], [DS9900 series], and [XIV] for disk, and the TS family [TS1000, TS2000, TS3000] series for tape drives and libraries. In much thesame way you might prefer brown rice instead of white rice, or linguine instead of penne pasta, you might find the attributes of one storagefoundation more attractive based on its performance, scalability and availability features for yourparticular application workloads.
Fellow IBM blogger Barry Whyte discusses SVC at great length on his [Storage Virtualization] blog. Flavoring disk foundation storage with SAN Volume Controller can provide you additionalfeatures and functions, and help improve the scalability, performance or availability characteristics.For example, if you have DS4000, DS8000 and XIV, you might use SVC to provide a consistent methodologyfor asynchronous replication, a form of consistent "flavoring" if you will.
N series Gateways
The [N series gateways] offerflavoring to disk foundation, including unified NAS, iSCSI and FCP protocol host attachment, and application aware capabilities. (As for our IBM N series appliances or "filers", these could be foundational storage behind an SVC, but that's perhaps a topic for another post.)
SoFS provides a global namespace with clustered NAS access to files. This is a blended disk-and-tape solution with built-in backup and Information Lifecycle Management [ILM]. Policies can be used to place different files onto different tiers of storage, automate the movement from tier to tier, including migration to tape, and even expiration when the data is no longer needed.
The [IBM System Storage DR550] provides Non-erasable, Non-rewriteable (NENR) flavoring to storage. While the DR550 comes with internal disk storage, it can front end a tape library filled with WORM cartridges. The DR550 hasbeen paired up with small libraries (TS3200 or TS3310) as well as larger libraries like the TS3500.
The IBM Grid Medical Archive Solution [GMAS] provides a variety of capabilities for storing and accessing medical images, using a blended disk-and-tape approach. This allows hospital and clinicnetworks to provide access for doctors and radiologists from multiple locations.
Many of the flavorings are called "gateways". The IBM TS7650G flavors disk that provides a virtualtape library[VTL] with inline data deduplication capability. Recent performance tests pairing the TS7650G flavoring with XIV foundation storage found this combination to be an excellent match.
Let me know what you think. Does this help you understand IBM's storage strategy and acquisitions? Enteryour comments below.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
Well, I'm back safely from my tour of Asia. I am glad to report that Tokyo, Beijing and Kuala Lumpur are pretty much how I remember them from the last time I was there in each city. I have since been fighting jet lag by watching the last thirteen episodes of LOST season 6 and the series finale.
Recently, I have started seeing a lot of buzz on the term "Storage Federation". The concept is not new, but rather based on the work in database federation, first introduced in 1985 by [A federated architecture for information management] by Heimbigner and McLeod. For those not familiar with database federation, you can take several independent autonomous databases, and treat them as one big federated system. For example, this would allow you to issue a single query and get results across all the databases in the federated system. The advantage is that it is often easier to federate several disparate heterogeneous databases than to merge them into a single database. [IBM Infosphere Federation Server] is a market leader in this space, with the capability to federate DB2, Oracle and SQL Server databases.
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
To some extent, IBM SAN Volume Controller (SVC), XIV, Scale-Out NAS (SONAS), and Information Archive (IA) offer most, if not all, of these capabilities. EMC claims its VPLEX will be able to offer storage federation, but only with other VPLEX clusters, which brings up a good question. What about heterogenous storage federation? Before anyone accuses me of throwing stones at glass houses, let's take a look at each IBM solution:
IBM SAN Volume Controller
The IBM SAN Volume Controller has been doing storage federation since 2003. Not only can IBM SAN Volume Controller bring together storage from a variety of heterogenous storage, the SVC cluster itself can be a mix of different hardware models. You can have a 2145-8A4 node pair, 2145-8G4 node pair, and the new 2145-CF8 node pair, all combined together into a single SVC cluster. Upgrading SVC hardware nodes in an SVC cluster is always non-disruptive.
IBM XIV storage system
The IBM XIV has two kinds of independent modules. Data modules have processor, cache and 12 disks. Interface modules are data modules with additional processor, FC and Ethernet (iSCSI) adapters. Because these two modules play different roles in an XIV "colony", that number of each type is predetermined. Entry-level six-module systems have 2 interface and 4 data modules. Full 15-module systems have 6 interface and 9 data modules. Individual modules can be added or removed non-disruptively in an XIV.
IBM Scale-Out NAS
The SONAS is comprised of three kinds of nodes that work together in concert. A management node, one or more interface nodes, and two or more storage nodes. The storage nodes are paired to manage up to 240 nodes in a storage pod. Individual interface or data nodes can be added or removed non-disruptively in the SONAS. The underlying technology, the General Parallel File System, has been doing storage federation since 1996 for some of the largest top 500 supercomputers in the world.
IBM Information Archive (IA)
For the IA, there are 1, 2 or 3 nodes, which manages a set of collections. A collection can either be file-based using industry-standard NAS protocols, or object-based using the popular System Storage™ Archive Manager (SSAM) interface. Normally, you have as many collections as you have nodes, but nodes are powerful enough to manage two collections to provide N-1 availability. This allows a node to be removed, and a new node added into the IA "colony", in a non-disruptive manner.
Even in an ant colony, there are only a few types of ants, with typically one queen, several males, and lots of workers. But all the ants are red. You don't see colonies that mix between different species of ants. For databases, federation was a way to avoid the much harder task of merging databases from different platforms. For storage, I am surprised people have latched on to the term "federation", given our mixed results in the other "federations" we have formed, which I have conveniently (IMHO) ranked from least effective to most effective:
The Union of Soviet Socialist Republics (USSR)
My father used to say, "If the Soviet Union were in charge of the Sahara desert, they would run out of sand in 50 years." The [Soviet Union] actually lasted 68 years, from 1922 to 1991.
The United Nations (UN)
After the previous League of Nations failed, the UN was formed in 1945 to facilitate cooperation in international law, international security, economic development, social progress, human rights, and the achieving of world peace by stopping wars between countries, and to provide a platform for dialogue.
The European Union (EU)
With the collapse of the Greek economy, and the [rapid growth of debt] in the UK, Spain and France, there are concerns that the EU might not last past 2020.
The United States of America (USA)
My own country is a federation of states, each with its own government. California's financial crisis was compared to the one in Greece. My own state of Arizona is under boycott from other states because of its recent [immigration law]. However, I think the US has managed better than the EU because it has evolved over the past 200 years.
The Organization of the Petroleum Exporting Countries [OPEC]
Technically, OPEC is not a federation of cooperating countries, but rather a cartel of competing countries that have agreed on total industry output of oil to increase individual members' profits. Note that it was a non-OPEC company, BP, that could not "control their output" in what has now become the worst oil spill in US history. OPEC was formed in 1960, and is expected to collapse sometime around 2030 when the world's oil reserves run out. Matt Savinar has a nice article on [Life After the Oil Crash].
United Federation of Planets
The [Federation] fictitiously described in the Star Trek series appears to work well, an optimistic view of what federations could become if you let them evolve long enough.
Given the mixed results with "federation", I think I will avoid using the term for storage, and stick to the original term "scale-out architecture".
I gotten several emails expressing worry that I have fallen off the face of th earth. The last two weeks have been educational and eye-opening for me. I can't provide details in my blog, so I will just say that it involved government agencies that IBM refers to as "dark accounts", and that I am now back safely in the USA. Between adjusting to time zone differences, ridiculously long hours, and restricted access to the internet, I was unable to blog lately.
Instead, I will resume my coverage of the [IBM System Storage Technical University 2011]. The "Solutions Expo" runs Monday evening through Wednesday lunch. This is a chance for people to explore all the solutions that are part of IBM's large "eco-system" for IBM System storage and System x products. There were several sponsors for this event.
As is often the case at these conferences, the various booths hand out fun items. The hot items this year were tie-dyed tee-shirts from Qlogic, and propeller beanies from the IBM rack and power systems team. Here is Amanda, one of the bartenders showing off the latter.
After the expo on Tuesday night, my friends at [Texas Memory Systems] held an after-party. Unlike the pens, tee-shirts and keychains at the Expo, these guys had a raffle for real storage products. Here is Erik Eyberg handing out a RamSan PCIe card, valued at $14,000 or so. IBM recently certified the TMS RamSan as External SSD storage for the IBM SAN Volume Controller (SVC). The SVC can optimize performance using this for automated sub-LUN tiering with the IBM System Storage Easy Tier feature.
Over on his Backup Blog, fellow blogger Scott Waterhouse from EMC has a post titled
[Backup Sucks: Reason #38]. Here is an excerpt:
Unfortunately, we have not been able to successfully leverage economies of scale in the world of backup and recovery. If it costs you $5 to backup a given amount of data, it probably costs you $50 to back up 10 times that amount of data, and $500 to back up 100 times that amount of data.
If anybody can figure out how to get costs down to $40 for 10 times the amount of data, and $300 for 100 times the amount of data, they will have an irrefutable advantage over anybody that has not been able to leverage economies of scale.
I suspect that where Scott mentions we in the above excerpt, he is referring to EMC in general, with products like
Legato. Fortunately, IBM has scalable backup solutions, using either a hardware approach, or one purely with software.
The hardware approach involves using deduplication hardware technology as the storage pool for IBM Tivoli Storage Manager (TSM). Using this approach, IBM Tivoli Storage Manager would receive data from dozens, hundreds or even thousands
of client nodes, and the backup copies would be sent to an IBM TS7650 ProtecTIER data deduplication appliance, IBM TS7650G gateway, or IBM N series with A-SIS. In most cases, companies have standardized on the operating systems and applications used on these nodes, and multiple copies of data reside across employee laptops. As a result, as you have more nodes backing up, you are able to achieve benefits of scale.
Perhaps your budget isn't big enough to handle new hardware purchases at this time, in this economy. Have no fear,
IBM also offers deduplication built right into the IBM Tivoli Storage Manager v6 software itself. You can use sequential access disk storage pool for this. TSM scans and identifies duplicate chunks of data in the backup copies, and also archive and HSM data, and reclaims the space when found.
If your company is using a backup software product that doesn't scale well, perhaps now is a good time to switch over to IBM Tivoli Storage Manager. TSM is perhaps the most scalable backup software product in the marketplace, giving IBM an "irrefutable advantage" over the competition.
Continuing my coverage of the IBM Dynamic Infrastructure Executive Summit at the Fairmont Resort in Scottsdale, Arizona, we had a day full main-tent sessions. Here is a quick recap of the sessions presented in the morning.
Leadership and Innovation on a Smarter Planet
Todd Kirtley, IBM General Manager of the western United States, kicked off the day. He explained that we are now entering the Decade of Smart: smarter healthcare, smarter energy, smarter traffic systems, and smarter cities, to name a few. One of those smarter cities is Dubuque, Iowa, nicknamed the Masterpiece of the Mississippi river. Mayor Roy Boul of Dubuque spoke next on his testimonial on working with IBM. I have never been to Dubuque, but it looks and sounds like a fun place to visit. Here is the [press release] and a two-minute [video].
Smarter Systems for a Smarter Planet
Tom Rosamillia, IBM General Manager of the System z mainframe platform, presented on smarter systems. IBM is intentionally designing integrated systems to redefine performance and deliver the highest possible value for the least amount of resource. The five key focus areas were:
Enabling massive scale
Organizing vast amounts of data
Turning information into insight
Increasing business agility
Managing risk, security and compliance
The Future of Systems
Ambuj Goyal, IBM General Manager of Development and Manufacturing, presented the future of systems. For example, reading 10 million electricity meters monthly is only 120 million transactions per year, but reading them daily is 3.65 billion, and reading them every 15 minutes will result in over 350 billion transactions per year. What would it take to handle this? Beyond just faster speeds and feeds, beyond consolidation through virtualization and multi-core systems, beyond pre-configured fit-for-purpose appliances, there will be a new level for integrated systems. Imagine a highly dense integration with over 3000 processors per frame, over 400 Petabytes (PB) of storage, and 1.3 PB/sec bandwidth. Integrating software, servers and storage will make this big jump in value possible.
POWERing your Planet
Ross Mauri, IBM General Manager of Power Systems, presented the latest POWER7 processor server product line. The IBM POWER-based servers can run any mix of AIX, Linux and IBM i (formerly i5/OS) operating system images. Compared to the previous POWER6 generation, POWER7 are four times more energy efficient, twice the performance, at about the same price. For example, an 8-socket p780 with 64 cores (eight per socket) and 256 threads (4 threads per core) had a record-breaking 37,000 SAP users in a standard SD 2-tier benchmark, beating out 32-socket and 64-socket M9000 SPARC systems from Oracle/Sun and 8-socket Nehalem-EX Fujitsu 1800E systems. See the [SAP benchmark results] for full details. With more TPC-C performance per core, the POWER7 is 4.6 times faster than HP Itanium and 7.5 times faster than Oracle Sun T5440.
This performance can be combined with incredible scalability. IBM's PowerVM outperforms VMware by 65 percent and provides features like "Live Partition Mobility" that is similar to VMware's VMotion capability. IBM's PureScale allows DB2 to scale out across 128 POWER servers, beating out Oracle RAC clusters.
The final speaker in the morning was Greg Lotko, IBM Vice President of Information Management Warehouse solutions. Analytics are required to gain greater insight from information, and this can result in better business outcomes. The [IBM Global CFO Study 2010] shows that companies that invest in business insight consistently outperform all other enterprises, with 33 percent more revenue growth, 32 percent more return on invested (ROI) capital, and 12 times more earnings (EBITDA). Business Analytics is more than just traditional business intelligence (BI). It tries to answer three critical questions for decision makers:
What is happening?
Why is it happening?
What is likely to happen in the future?
The IBM Smart Analytics System is a pre-configured integrated system appliance that combines text analytics, data mining and OLAP cubing software on a powerful data warehouse platform. It comes in three flavors: Model 5600 is based on System x servers, Model 7600 based on POWER7 servers, and Model 9600 on System z mainframe servers.
IBM has over 6000 business analytics and optimization consultants to help clients with their deployments.
While this might appear as "Death by Powerpoint", I think the panel of presenters did a good job providing real examples to emphasize their key points.
Wrapping up my week's coverage of the IBM Pulse 2011 conference, I have had several people ask me to explain IBM's latest initiative, Smarter Computing, which IBM launched this week at this conference. Having led the IT industry through the Centralized Computing era and the Distributed Computing era, IBM is now well-positioned to help companies, governments and non-profit organizations to enter the new Smarter Computing era, focused on insight and discovery.
Thousands of IT professionals
Effiicent, but only the largest companies and governments had them
Millions of office workers
Personal computers (PC)
Innovative, extending the reach to small and medium-sized businesses, but resulted in server sprawl and increased TCO
Billions of people
Smart phones and other handheld devices
Efficient and Innovative, combining the best of centralized and distributed computing
1952 to 1980
1981 to 2010
2011 and beyond
To help clients with this transition, IBM's Smarter Computing initiative has three main components. This is a corporate-wide strategy, with systems, software and services all working together to realize results.
The first component is Big Data. This combines three different sources of data:
Traditional structured data in OLTP databases and OLAP data warehouses, using data management solutions like DB2 and IBM Netezza.
Unstructured data, including text documents, images, audio, and video, processed with massive parallelism using IBM BigInsights and Apache Hadoop.
Real-Time Analytics Processing (RTAP) of incoming data, including video surveillance, social media, RFID chips, smart meters, and traffic control systems, processed with IBM InfoSphere Streams
Of course, Big Data will bring new opportunities on the storage front, which I will save for a future post!
Rather than general purpose IT equipment, we have now the scale and scope to specialize with systems optimized for particular workloads, the second component of the Smarter Computing initiative. Of course, IBM has been delivering integrated stacks of systems, software and services for decades now, but it is important to remind people of this, as IBM now has a spate of competitors all trying to follow IBM's lead in this arena.
As with Big Data, the focus on Optimized Systems has impacted IBM's strategy on storage as well. I'll save that discussion for a future post as well!
I am glad that nearly all of the storage vendors have standardized to a common definition for Cloud, the third component of Smarter Computing, which shows that this concept has matured:
Cloud computing is a pay-per-use model for enabling network access to a pool of computing resources that can be provisioned and released rapidly with minimal management effort or service provider interaction. -- U.S. National Institute of Standards and Technology [nist.gov]
Of course, Cloud is just an evolution of IBM's Service Bureau business of the 1960s and 1970s, renting out time-sharing on mainframe systems, Grid Computing of the 1980s, and Application Service Providers that popped up in the 1990s. While the [butchers, bakers and candlestick makers] that IBM competes against might focus their efforts on just private cloud or just public cloud, IBM recognizes the reality is that different clients will need different solutions. Rather than rip-and-replace, IBM will help clients transition to cloud via inclusive solutions that adopt a hybrid approach:
Traditional enterprise with private cloud deployments, using solutions like IBM CloudBurst, SONAS and Information Archive
Traditional enterprise with public cloud services to handle seasonable peaks, providing offsite resiliency, and solutions for a mobile workforce
Hybrid clouds that blend private and public cloud services, to handle seasonal peak workloads, remote and branch offices
IBM's emphasis on IT Infrastructure Library (ITIL), Tivoli and Maximo products will play well in this space to provide integrated service management across traditional and cloud deployments. This is why IBM decided to launch Smarter Computing initiative at Pulse 2011 conference, the industry's premiere conference on intergrated service management.
The IBM Watson that competed on Jeopardy! is an excellent example of all three components of Smarter Computing at work.
IBM Watson was able to respond to Jeopardy! clues within three seconds, processing a combination of database searches with DB2 and text-mining analytics of unstructured data with IBM BigInsights.
IBM Watson combined servers, software and storage into an integrated supercomputer that was optimized for one particular workload: playing Jeopardy!
IBM Watson used many technologies prevalent in private and public cloud computing systems, storing its data on a modified version of SONAS for storage, using xCat administration tools, networking across 10GbE Ethernet, and massive parallel processing through lots of PowerVM guest images.
In his last post in this series, he mentions that the amazingly successful IBM SAN Volume Controller was part of a set of projects:
"IBM was looking for "new horizon" projects to fund at the time, and three such projects were proposed and created the "Storage Software Group". Those three projects became know externally as TPC, (TotalStorage Productivity Center), SanFS (SAN File System - oh how this was just 5 years too early) and SVC (SAN Volume Controller). The fact that two out of the three of them still exist today is actually pretty good. All of these products came out of research, and its a sad state of affairs when research teams are measured against the percentage of the projects they work on, versus those that turn into revenue generating streams."
But this raises the question: Was SAN File System just five years too early?
IBM classifies products into three "horizons"; Horizon-1 for well-established mature products, Horizon-2 was for recently launched products, and Horizon-3 was for emerging business opportunities (EBO). Since I had some involvement with these other projects, I thought I would help fill out some of this history from my perspective.
Back in 2000, IBM executive [Linda Sanford] was in charge of IBM storage business and presented that IBM Research was working on the concept of "Storage Tank" which would hold Petabytes of data accessible to mainframes and distributed servers.
In 2001, I was the lead architect of DFSMS for the IBM z/OS operating system for mainframes, and was asked to be lead architect for the new "Horizon 3" project to be called IBM TotalStorage Productivity Center (TPC), which has since been renamed to IBM Tivoli Storage Productivity Center.
In 2002, I was asked to lead a team to port the "SANfs client" for SAN File System from Linux-x86 over to Linux on System z. How easy or difficult to port any code depends on how well it was written with the intent to be ported, and porting the "proof-of-concept" level code proved a bit too challenging for my team of relative new-hires. Once code written by research scientists is sufficiently complete to demonstrate proof of concept, it should be entirely discarded and written from scratch by professional software engineers that follow proper development and documentation procedures. We reminded management of this, and they decided not to make the necessary investment to add Linux on System z as a supported operating system for SAN file system.
In 2003, IBM launched Productivity Center, SAN File System and SAN Volume Controller. These would be lumped together with Horizon-1 product IBM Tivoli Storage Manager and the four products were promoted together as the inappropriately-named [TotalStorage Open Software Family]. We actually had long meetings debating whether SAN Volume Controller was hardware or software. While it is true that most of the features and functions of SAN Volume Controller is driven by its software, it was never packaged as a software-only offering.
The SAN File System was the productized version of the "Storage Tank" research project. While the SAN Volume Controller used industry standard Fibre Channel Protocol (FCP) to allow support of a variety of operating system clients, the SAN File System required an installed "client" that was only available initially on AIX and Linux-x86. In keeping with the "open" concept, an "open source reference client" was made available so that the folks at Hewlett-Packard, Sun Microsystems and Microsoft could port this over to their respective HP-UX, Solaris and Windows operating systems. Not surprisingly, none were willing to voluntarily add yet another file system to their testing efforts.
Barry argues that SANfs was five years ahead of its time. SAN File System tried to bring policy-based management for information, which has been part of DFSMS for z/OS since the 1980s, over to distributed operating systems. The problem is that mainframe people who understand and appreciate the benefits of policy-based management already had it, and non-mainframe couldn't understand the benefits of something they have managed to survive without.
(Every time I see VMware presented as a new or clever idea, I have to remind people that this x86-based hypervisor basically implements the mainframe concept of server virtualization introduced by IBM in the 1970s. IBM is the leading reseller of VMware, and supports other server virtualization solutions including Linux KVM, Xen, Hyper-V and PowerVM.)
To address the various concerns about SAN File System, the proof-of-concept code from IBM Research was withdrawn from marketing, and new fresh code implementing these concepts were integrated into IBM's existing General Parallel File System (GPFS). This software would then be packaged with a server hardware cluster, exporting global file spaces with broad operating system reach. Initially offered as IBM Scale-out File Services (SoFS) service offering, this was later re-packaged as an appliance, the IBM Scale-Out Network Attached Storage (SONAS) product, and as IBM Smart Business Storage Cloud (SBSC) cloud storage offering. These now offer clustered NAS storage using the industry standard NFS and CIFS clients that nearly all operating systems already have.
Today, these former Horizon-1 products are now Horizon-2 and Horizon-3. They have evolved. Tivoli Storage Productivity Center, GPFS and SAN Volume Controller are all market leaders in their respective areas.
When I turned on the television last weekend, I saw large waves of water knock down rows of small houses. I thought I had caught the end of a bad Godzilla movie, but sadly it was not movie special effects. Mother Nature can be quite destructive. Over the past four days, Japan has been hit hard by a series of earthquakes and resulting tsunami.
(Note: Disasters can happen anywhere and at any time. Last month, New Zealand had an earthquake as well. It is best to always be prepared. If you haven't done so lately, check out the latest recommendations from the US Government [Ready.Gov] website.)
Several have asked me how this tragedy in Japan might affect IBM and its clients. Here is what I have gathered from various sources. All IBM Japan employees have survived, are safe and reporting no major injuries. IBM has four major facilities, near central part of the country around Tokyo, far from Sendai, the epicenter. All IBM buildings are still standing and operational. A few sections of Tokyo are affected by scheduled brown-outs in an effort to save electricity. Employees are asked to telecommute (a.k.a. work from home) to minimize traffic congestion.
Hakozaki - Headquarters and executive briefing center
Makuhari - Technical Center, where we often hold conferences and other events
Yamato - Research Facility, where R&D is done for IBM tape storage products
Toyosu - Service Delivery Center
I have been to Japan many times throughout my career. Back in the summer of 1995, IBM sent me to Osaka to help out clients in the aftermath of the Great Hanshin eartquake near Kobe. I remember it well, sending an email back to my team saying "It is 1995, and here in Japan it is 95 degrees and 95 percent humidiy." It was seven months after the earthquake, but people were still living in cardboard boxes and make-shift tents.
Many people asked if I will be going back to Japan to help out. I speak Japanese, can make sense of the Japanese Katakana characters on computer monitors, and am an expert in Disaster Recovery. However, the IBM Japan team is doing an awesome job helping our clients restore their data and recovery their business operations. Of course, if IBM needs me in Japan, I will gladly go, but so far, it doesn't seem that I am needed there.