SSD storage are clear, the cost is often prohibitive. But what if you
can target the data that really needs the performance edge at the SSD
drives? You could balance the cost against IT performance gains that
truly help your business perform. Read this...
Full Article at BNET While the performance advantages of SSD storage are clear, the cost is
often prohibitive. But what if you can target the data that really needs
the performance edge at the SSD drives? You could balance the cost
against IT performance gains that truly help your business perform. Read
this brief from Mesabi Group to see how IBM Storwize� V7000 "users now
have the tools � with the combination of Storage Tier Advisor and Easy
Tier � to be able to plan for and use SSDs appropriately in their
distinctive workload environments."
“Procedures for replacing or adding nodes to an existing cluster” Scope and Objectives The scope of this document is two fold. The first section provides a procedure for replacing existing nodes in a SVC cluster non-disruptively. For example, the current cluster consists of two 2145-8F4 nodes and the desire is to replace them with two 2145-CF8 nodes maintaining the cluster size at two nodes. The second section provides a procedure to add nodes to an existing cluster to expand the cluster to support additional workload. For example, the current cluster consists of two 2145-8G4 nodes and the desire is to grow it to a four node cluster by adding two 2145-CF8 nodes. The objective of this document is to provide greater detail on the steps required to perform the above procedures then is currently available in the SVC Software Installation and Configuration Guide, SC23-6628, located at www.ibm.com/storage/support/2145. In addition, it provides important information to assist the person performing the procedures to avoid problems while following the various steps. Section 1: Procedure to replace existing SVC nodes non-disruptively You can replace SAN Volume Controller 2145-8F2, 2145-8F4, 2145-8G4, and 2145-8A4 nodes with SAN Volume Controller 2145-CF8 nodes in an existing, active cluster without taking an outage on the SVC or on your host applications. In fact you can use this procedure to replace any model node with a different model node as long as the SVC software level supports that particular node model type. For example, you might want to replace a 2145-8F2 node in a test environment with a 2145-8G4 node previously in production that just got replaced by a new 2145-CF8 node. Note: If you are attempting to replace existing 2145-4F2 nodes with new 2145-CF8 nodes do not use this procedure as you must use the procedure specifically for this sort of upgrade located at the following URL: ftp://ftp.software.ibm.com/storage/san/sanvc/V5.1.0/pubs/multi/4F2MigrationVer1.pdf This procedure does not require changes to your SAN environment because the new node being installed uses the same worldwide node name (WWNN) as the node you are replacing. Since SVC uses this to generate the unique worldwide port name (WWPN), no SAN zoning or disk controller LUN masking changes are required.READ MORE>
IBM® System Storage™ N series with Operations Manager software offers
comprehensive monitoring and management for N series enterprise storage
and content delivery environments. Operations Manager is designed to
provide alerts, reports, and configuration tools from a central control
point, helping you keep your storage and content delivery infrastructure
in-line with business requirements for high availability and low total
cost of ownership.
We focus especially on Protection Manager, which is designed as an
intuitive backup and replication management software for IBM System
Storage N series unified storage disk-based data protection
environments. The application is designed to support data protection and
help increase productivity with automated setup and policy-based
This IBM Redbooks® publication demonstrates how Operation Manager
manages IBM System Storage N series storage from a single view and
remotely from anywhere. Operations Manager can monitor and configure all
distributed N series storage systems, N series gateways, and data
management services to increase the availability and accessibility of
their stored and cached data. Operations Manager can monitor the
availability and capacity utilization of all its file systems regardless
of where they are physically located. It can also analyze the
performance utilization of its storage and content delivery network. It
is available on Windows® , Linux® , and Solaris™ . Read More>
Solid state drives (SSDs) based on flash memory are generating a lot of excitement. This enthusiasm is warranted because flash SSDs demonstrate latencies that are at least 10 times lower than the fastest hard disk drives (HDDs), often enabling response times more than 10X faster. For random read workloads, SSDs may deliver the I/O throughput of 30 or more HDDs while consuming significantly less power per disk. The performance of SSDscan reduce the number of fast-spinning hard disk drives you need in a storage system.Fewer disk drives translates into significant savings of power, cooling, and data center space. This performance benefit comes at a premium; flash SSDs are far more expensive per gigabyte of capacity than HDDs. Therefore SSDs are best applied in situations that require the highest performance.
The underlying flash memory technology used by SSDs has many advantages, particularly in comparison to DRAM. In addition to storage persistence, these advantages include higher density, lower power consumption, and lower cost per gigabyte. Because of these unique characteristics, NetApp is focusing on the targeted use of flash memory in storage systems and within your storage infrastructure in ways that can deliver the most performance acceleration for the minimum investment.
We are implementing flash memory solutions using SSDs for persistent storage, and we will also use flash memory directly to create expanded read caching devices. Caching can deliver performance that is comparable to or better than SSDs. Because you can complement a large amount of hard disk capacity with a relatively modest amount of read cache, caching is more cost effective for typical enterprise applications. As a result, more people can benefit from the performance acceleration achievable with flash technology.
You get even more flexibility and value from flash technology by combining it with the NetApp® unified storage architecture, which enables you to leverage your investment in flash memory to simultaneously accelerate multiple applications, whether they use SAN or NAS. Storage efficiency features such as deduplication for primary storage further increase your power, cooling, and space savings.
This white paper is an overview of NetApp’s plan to deliver SSDs (both native and virtualized arrays) plus flash-based read caching and of our ability to further leverage both of these technologies in caching architectures. Selection guidelines are provided to help you choose the right technology to reduce latency and increase your transaction rate while taking into consideration cost versus benefit.
The IBM XIV® Storage System demonstrates how storage can simplify
management and provisioning, yielding optimizing benefits especially for
virtualized server environments. This means that growth in data does
not mean growth in complexity. XIV has a virtualized, grid-based
architecture that enables self-tuning and self-healing, as well as
amazing management with simplicity and low total costs.
Cloud security: the grand challenge In addition to the usual challenges of developing secure IT systems, cloud computing presents an added level of risk because essential services are often outsourced to a third party. The externalized aspect of outsourcing makes it harder to maintain data integrity and privacy, support data and service availability, and demonstrate compliance. In effect, cloud computing shifts much of the control over data and operations from the client organization to their cloud providers, much in the same way organizations entrust part of their IT operations to outsourcing companies. Even basic tasks, such as applying patches and configuring firewalls, can become the responsibility of the cloud service provider, not the user. This means that clients must establish trust relationships with their providers and understand the risk in terms of how these providers implement, deploy, and manage security on their behalf. This trust but verify relationship between cloud service providers and consumers is critical because the cloud service consumer is still ultimately responsible for compliance and protection of their critical data, even if that workload had moved to the cloud. In fact, some organizations choose private or hybrid models over public clouds because of the risks associated with outsourcing services. Other aspects about cloud computing also require a major reassessment of security and risk. Inside the cloud, it is difficult to physically locate where data is stored. Security processes that were once visible are now hidden behind layers of abstraction. This lack of visibility can create a number of security and compliance issues. In addition, the massive sharing of infrastructure with cloud computing creates a significant difference between cloud security and security in more traditional IT environments. Users spanning different corporations and trust levels often interact with the same set of computing resources. At the same time, workload balancing, changing service level agreements, and other aspects of today's dynamic IT environments create even more opportunities for misconfiguration, data compromise, and malicious conduct. Infrastructure sharing calls for a high degree of standardized and process automation, which can help improve security by eliminating the risk of operator error and oversight. However, the risks inherent with a massively shared infrastructure mean that cloud computing models must still place a strong emphasis on isolation, identity, and compliance. Cloud computing is available in several service models (and hybrids of these models). Each presents different levels of responsibility for security management. Figure 1 on page 3 depicts the different cloud computing models. READ MORE>
Last week I was briefing Dan Kusnetzky, storage analyst from the Kusnetzky Group. I was briefing him on the value proposition of Real-time Compression and it's value proposition to all down stream processes, especially backup. Specifically I told him that there is NO technology available today that can have even 50% of the effect on the existing backup process that Real-time Compression could have without changing any architecture in the backup process.
Dan agreed, in fact, he told me that the Real-time Compression technology meets the "Golden Rules of IT". I asked Dan, "What are the Golden Rules of IT?" and he enlightened me. I didn't make these up so I can't take credit but thought they were definitely worth sharing, and a good rule of thumb to follow for IT. Here they are:
If its not broke, don't fix it.
Don't touch it, you'll break it.
If you touched it, you broke it.
Good enough, is good enough.
Accept your "jerkdom" (Everybody is an Monday morning quarterback)
I have to agree, these are good rules to follow and a great complement to the Real-time Compression technology. The fact that this technology fits into any storage environment, transparently and can optimize storage up to 5x without any performance impact, is very simple and one of the only ways to have a significant, compounding budgetary affect for very little dough.
Technology giant IBM on Tuesday said it has emerged as
the top player in the Indian external disk storage systems for the year
According to IT research firm IDC, IBM India
has maintained its 2010 leadership with a 26.2 per cent market share (in
revenue terms) and over four per cent points lead over its nearest
“While the overall external disk storage
market in India declined to 1.5 per cent in calender year 2010,
according to IDC, IBM has been able to grow its hold in the country
given the constant innovation and focus on bringing in storage
efficiency,” Sandeep Dutta, Storage, Systems and Technology Group, IBM
India/ South Asia told PTI.
Also, in Q4 2010, IBM
maintained leadership with a 29 per cent market share and a seven per
cent point lead over its nearest competitor in revenue terms.
the year 2010, IBM launched products like IBM StorwizeV7000 and IBM
System Storage DS8000, which helped it to strengthen its leadership
position in the market.
During the year, IBM bagged
orders from Kotak, Suzlon, Oswal mills, CEAT, L&T (ECC division),
Indian Farmer and Fertilizer Cooperative Ltd, Solar Semiconductors and
Ratnamani Metals. Read More>
WASHINGTON - 01 Mar 2011:IBM (NYSE:IBM) today announced a major expansion of its Institute for Electronic Government (IEG) in Washington, D.C., adding cloud computing and analytics capabilities for public sector organizations around the world.
IBM has moved and expanded the facility in order to meet the growing demand from Government, Health Care and Education leaders who recognize the potential of cloud computing environments and business analytics technologies to improve efficiencies, reduce costs and tackle energy and budget challenges.
According to recent IBM surveys of technology leaders globally, 83 percent of respondents identified business analytics -- the ability to see patterns in vast amounts of data and extract actionable insights -- as a top priority and a way in which they plan to enhance their competitiveness. In addition, an overwhelming majority of respondents -- 91 percent -- expect cloud computing to overtake on-premise computing as the primary IT delivery model by 2015.
That includes all the same features like replication, thin provisioning, self-optimized flash tier and Cloud Agile, which is the ability to take advantage of cloud storage technology for replication and recovery of data, all in an array that lists starting at under $11,000, Walsh said. Read More>
Two solid days at VMworld 2011 and I got to do and see a lot. Here is a breakdown of the top 5 things I saw at VMworld.
#1 The SiliconAngle / Wikibon Cube
You couldn’t miss it. You walk into the show floor and there they
were, larger than life. The SiliconAngle / Wikibon Cube broadcasting
live from VMworld2011. Guests that were on the cube included, Tom
Georgens (NTAP), Pat Gelsinger (EMC), David Scott (HP), Rick Jackson
(VMware) as well as many more. The Cube also had 12 Industry
Spotlights. The most interesting spotlight had to do with Storage
Optimization, especially for VMware.
Oh the times they are a changing. Now that you can deliver HD TV
live over the internet, the Cube has broadcast from a number industry
shows and user conferences. The great part about this, it is like the
ability to watch a sporting event being covered by ESPN but for tech.
The Cube brings all of the highlights of these events right into your
computer screen. Now if you can’t make an event, no problem, you can
catch all the most important messages from the Cube. The Cube is now
the new mechanism for delivering content to users in the way they want
to receive the content, TV. For more, check out www.siliconangle.tv
#2 Storage Optimization – Industry Spotlight
In the Storage Optimization industry spotlight, the first 15 minutes
Dave Vellante and his co-host John Furrier tee up the concept. They
discussed storage optimization, where it has come and were it is going,
especially in VMware environments. We are hearing more and more about
storage efficiency technologies. During the next 15 minutes Dave and I
discussed the 5 essential storage efficiency technologies including:
We also discussed the fact that the IBM Real-time Compression
technology is not only the most efficient and effective compression
technology in the industry; we also learned that IBM really acquired not
just a real-time “compression” technology but a platform that can do a
number of things in real time. In fact, the 5 IBM storage efficiency technologies all operate in real time which is the most effective for customers.
We have been hearing a great deal about storage optimization in a
VMware environment due to the fact that virtualizing servers was
successful for the server side of the house but it didn’t do all it set
out to do, it didn’t fix the overall IT budget.
Virtualizing servers only pushed the financial problem to the storage
side of the house. Users have told us that when they virtualize their
servers, storage grows as much as 4x. By leveraging the right storage
optimization technologies together, users can get their budgets back
under control and also deliver the promise that server virtualization
set out to do.
#3 More Free Time for “Real-life”
While on the Cube as a panelist with my good friend Marc Farley
(HPsisyphus, formally @3ParFarley) Dave asked us what was the most
interesting thing we saw on the show floor while walking around. I
didn’t hesitate in my response. There were two in my mind. First, it
couldn’t be any more obvious at how fast data is growing. Over 50% of
the 19,000 people there had cameras taking pictures and taking video.
That data is going to be stored somewhere. Additionally, they had these
cameras for a reason. Either we have more bloggers and tweeters than
we know about, more marketing people are going to these events or more
people are using social media to inform and educate others. The way in
which users want to receive data is always changing and evolving, and at
least at VMworld 2011 we were delivering content in a number of ways
especially photos and video. All that data will end up in the “cloud”
The second thing I noticed was the amount of free time VMware has
given back to the IT user. I heard, on more than one occasion, end
users talking about family, vacations and travel instead of the usual
banter about how challenging their jobs are and the issues they have
with their vendors which is the normal think I hear at these shows.
This was not an anomaly. I am chalking it up to the fact that VMware
makes people’s lives easier.
#4 Proximal Data
These “most interesting things” are not in any particular order. I say this because I believe that Proximal Data is THE
most interesting thing I saw at the show. Now Proximal Data just came
out of “stealth” in early August. They didn’t have a booth at VMworld
but they did have a “whisper suite”. So, I have to confess, since I
used to be an analyst, sometimes people will ask me to come take a look
at their technology and their message to see if it is in line with what
is going on in the industry so I got to hear the pitch.
Proximal Data’s message is right on. It hits a very important and
growing topic with VMware these days, the I/O bottle neck on virtual
servers, and they solve this problem in a very unique and intelligent
First, the problem. One of the issues facing VMware today is the
number of virtual machines that can be hosted by one physical machine.
The more users can get on one system, the more efficient they can be.
The problem is, today systems are running into I/O workload bottlenecks
that are causing a limitation in the number of virtual machines one
system can run.
One way to solve this problem is add more memory to the host but that
could be very very expensive. You can add more HBA’s or NIC cards but
that can be expensive and also difficult to manage. You can add more
flash cache to your storage to improve the I/O bottleneck but doing that
only solves ½ the problem, you still need to solve the challenge on the
host side, again with memory or host adaptors.
The solution: Proximal Data. With some advanced I/O management
software capabilities combined with PCI flash cards on the host, for a
very reasonable price per host. The software combined with the card is
100% transparent to both the virtual servers and to the storage, which
to me is one of the most important features of the implementation.
Transparency is the key to any new technology. IT has a ton of
challenges and has done a great deal of work to get their environment to
where it is today. To implement a technology that causes all of that
work to be undone is very painful. Remember, the hardest thing to
change in IT is process, not technology. It’s important to preserve the
process. That is what Proximal Data does. Proximal Data can increase
the I/O capability of a VMware server with just a 5 minute installation
of the PCI card and their software. This technology can double and even
triple the number of virtual machines on any physical server and that
is a tremendous ROI. A new win for efficiency.
There are a number of folks entering this market these days; however
Proximal does it transparently with no agents making it the most user
friendly implementation. While these guys won’t have product until
2012, when it hits the market, I am sure it will be very successful.
#5 Convergence to the Cloud
Are we seeing the coming of the “God Box”? A number of vendors are
talking more and more as well as investing in public / private cloud.
There are more systems popping up that have servers, networks, high
availability and storage all in one floor tile. These systems are
designed to integrate, scale, manage VM’s simply, increase productivity
and ease the management of all possible application deployments in any
business. Additionally these boxes help you to connect to the cloud to
ease the cost burden. Is the pendulum swinging back to the “open
systems” main frame? Only time will tell.
One more for fun. The first meeting I had at VMworld was with a
potential OEM prospect of the IBM Real-time Compression IP. I have
always said that this technology could revolutionize the data storage
business much like VxVM did for Veritas many years ago. Creating a
standard way to do compression across a number of system can help users
with implementation as well as ease the storage cost burden. I hope
this moves forward and I hope more folks step up who want to OEM the
“Storage Efficiency” has become a big topic over the past 12 months. There are a number of new technologies that have come out in the last few years that are helping to deal with storage growth. We all know that data is the root of the decisions that drive business today. The more data you have, hopefully, the better decisions you can make to drive your business to success. The question is, “what is the value (and hence the cost) of the infrastructure to create that success?” What we do know is that the ability to put more data in a highly efficient footprint can give your company a competitive edge. There are five technologies that can help an IT organization create an efficient storage infrastructure. These are:
3) Thin Provisioning
It is also important to point out that there are some semantics when talking about storage efficiency, specifically between efficiency and optimization technologies. I think it is useful to attempt to define these as they lead us to picking the right solutions for what we are trying to accomplish. For the purpose of this post, efficiency will relate to making existing capacity more useful and optimization will mean making more capacity out of existing capacity.
Using these definitions, technologies such as Tiering, Virtualization and Thin Provisioning are efficiency technologies. These technologies help to utilize the existing capacity that you have.
Tiering is technology that is used on about 10% of your data or less. It is used to move data that requires higher performance to flash storage. Good tiering technology analyzes data access patterns and moves the most active data to the highest performing disk. It doesn’t really change the amount of physical capacity that is required; it just changes whattypeof capacity is required and allows IT to make sure data is operating as fast and efficiently as possible.
Virtualization technology allows IT to make sure disk utilization is used as efficiently as possible. Until recently storage utilization rates were around 50%. By leveraging virtualization technology, IT can group pools of storage so they don’t need to purchase capacity needlessly. Virtualization can be used on 50% to 60% of your storage but it doesn’t change your physical capacity infrastructure requirements and at most allows users to take advantage of 20% to 40% of their capacity that they once didn’t access.
Similar to Virtualization technology, thin provisioning technology also can be used on 50% to 60% of your capacity however, thin provisioning technology gives IT about 10% to 40% of their capacity back. Thin Provisioning helps IT manage their existing capacity and their utilization by being able to make capacity available to users much easier again however it doesn’t change the amount of physical storage infrastructure required.
Optimization technologies help IT to better manage their physical storage footprint. Optimization technologies optimize existing infrastructure by allowing users to put more capacity in the physical same space. The two technologies that are currently used today are data deduplication and real-time compression.
Optimization technologies are a bit tricky. There is a balance that is required between optimization and performance and availability. At the end of the day, IT chooses the storage it buys with two very important characteristics in mind, performance and availability. Optimization technologies can not affect these characteristics. It is for this reason that data deduplication really isn’t ready for “prime time” on primary, active storage. Data deduplication creates too much of a performance impact on primary, active data. Today, data deduplication could be used on about 10% to 15% of the primary, less active capacity that is in the data center and only provides about 30% to 50% overall optimization. In other words deduplication technology can impact the physical infrastructure by as much as 10%, meaning IT may not need to buy as much physical capacity.
Real-time compression, on the other hand, has one of the most dramatic affects on primary storage capacity. Real-time compression can be used on as much as 85% of the storage footprint and can compress data between 50% and 80%. That said Real-time compression could have IT purchase as much as 70% less overall storage capacity. Real-time compression also does not affect the main characteristics for which users buy storage (performance and availability). IT could have as much as 70% less footprint but keep the same amount of data or more on-line. Additionally, IT can now purchase storage opportunistically without having to have such a dramatic impact on their infrastructure, process or budgets. This allows companies to keep more capacity on line and available to help companies do more analytics on more capacity and become more competitive.
When deciding which storage efficiency technology will have a more effective impact on your overall environment and budget, start with optimization technologies and start to get the data growth under control. Adding value to the line of business that can drive revenue with more data will make you a hero and your business more successful.
by Steve Kenniston The first city on my Eastern European trip was Moscow. I think the
traffic here is worse than the 101 in Silicon Valley during the dot com
era. That said, it was a great visit. I spoke at the Information
Infrastructure Conference at the Swissotel convention center in Moscow.
It was the first time I spoke to a group of people with an
interpreter. It was like being at the UN. The two main topics were
Storage Efficiency and Real-time Compression.
I spoke with a few customers and the press and in dealing with the
data growth challenges they wanted to know, “When it comes to big data,
what is next, is it ‘huge data’”? Data growth clearly a concern.
Interesting enough though most of the questions, came around my title of
“Evangelist”. One report told me, “if an Evangelist is ‘preaching the
word of storage’ then why not just call yourself an Apostle”? How do
you think that would look on an IBM business card: Global Storage
The next day I did a day of “sales enablement” in the Moscow office.
We discussed mostly how to sell and position Real-time Compression and
what is next for the technology. I was very impressed with the team.
They were very technical and knew quite a bit about Real-time
Compression and really wanted to know in more detail how the technology
was invented. This means they are really talking about the technology
and the customers are drilling down into the next level of detail.
There are a lot of good opportunities for the technology in Moscow and I
look forward to hearing more about the success of Real-time Compression
I didn’t have a lot of time to sight see but I did make it to Red
Square. You can actually buy a beer outside in Red Square and walk
around. So I did. I took a few photos and then as the US was getting
going, I had some work calls to attend to. That evening I spent on the
34th floor of my hotel having dinner. It was a great view of Moscow. I hope to come back.
by Steve Kenniston After landing in Warsaw, I got into a car with the local sales leader
for Poland and we drove to the event location. It was a 2 hour drive.
First, the roads and the land in Poland reminded me very much of my
home time in Maine. Very scenic and rural but beautiful and peaceful.
We talked storage for 2 hours and I am always festinated at the thirst
for knowledge there is when I travel. It was a great ride followed up
by a customer reception and some local Polish brew.
Thursday I spent the day in Sterdyn, Poland for IBM Storage
University. There were 30 customers at the event and it went very very
well. The event was at Palac Ossolinski,
today used as an event center but has a very rich history, in fact at
one point it was used as a medical facility in WWII. The photo is of
the building where we had the event. The topics we covered were:
The customers were very interactive and provided a lot of insight to
their environments. Interestingly enough I learned during our customer
reception that IBM storage is #1 in Poland with HP second and EMC
third. This is a true testament to the IBM sellers and the customers
who use the IBM products every day to drive their business. I also
learned that the data break down in Poland is 90% block, 10% file which I
found interesting and would be interested to check back 12 months from
today to see how it will be different.
I did learn something very interesting in Poland. The question was
asked “Why XIV”? What is so special about XIV. The answer was
awesome. The answer started with 2 questions:
1) How old is RAID?
2) How old is your iPhone?
The reality is data growth is out pacing what traditional RAID can
handle and data profiles are changing as well. These combined have
driven new technologies like Cleversafe, Cloud Computing, Hadoop and
XIV. Just like the iPhone is a new approach to the smart phone based on
new things we know about how these smart phones are being used, we know
more about how data and storage is being used. New ways to deliver
capacity and performance are needed in order to keep up with the
changing times. I thought it was a very good answer in terms that make
Thursday evening I traveled back to Warsaw where I got in a bit late
and just went to a local pub, Sketch. Grabbed a small bite and some
local mead and then headed back to the hotel. I did get to see the
local Palace of Culture and Science in the middle of Warsaw, very
impressive, built as a gift from Russia to Poland.
I have an early flight to Prague. I am very excited about this part
of the journey as I have always wanted to travel to Prague. Press
meeting right when I land. Stay tuned.
There has been significant discussion in the industry about
storage optimization and making better use of storage capacity. A number
of storage vendors have successfully marketed data de-duplication for offline/backup applications, reducing the volume of backup data by a factor of 5-15:1, according to Wikibon user input.
Data de-duplication as applied to backup use cases is different
from compression in that compression actually changes the data using
algorithms to create a computational byproduct and write fewer bits.
With de-duplication, data is not changed, rather copies 2-N are deleted
and pointers are inserted to a 'master' instance of the data.
Single-instancing can be thought of as synonymous with de-duplication.
Traditional data de-duplication technologies however are
generally unsuitable for online or primary storage applications because
the overheads associated with the algorithms required to de-duplicate
data will unacceptably elongate response times. As an example, popular
data de-duplication solutions such as those from Data Domain, ProtecTier
(Diligent/IBM), Falconstor and EMC/Avamar are not used for reducing
capacities of online storage.
There are three primary approaches to optimizing online storage,
reducing capacity requirements and improving overall storage
efficiencies. Generally, Wikibon refers to these in the broad category
of on-line or primary data compression, although the industry will often
use terms like de-duplication (e.g. NetApp A-SIS) and single
instancing. These data reduction technologies are illustrated by the
following types of solutions:
NetApp A-SIS and EMC Celerra which employ either “data de-duplication light” or single-instance technology embedded into the storage array;
Each of these approaches has certain benefits and drawbacks. The
obvious benefit is reduced storage costs. However each solution places
another technology layer in the network and increases complexity and
Array-based data reduction
Array-based data reduction technologies such as A-SIS operate
in-line as data is being written to reduce primary storage capacity. The
de-duplication feature of WAFL (NetApp’s Write Anywhere File Layout)
allows the identification of duplicates of a 4K block at write time
(creating a weak 32-bit digital signature of the 4K block, which is then
compared bit-by-bit to ensure that there is no hash collision) and
placed into a signature file in the metadata. The work of identifying
the duplicates is similar to the snap technology and is done in the
background if controller resources are sufficient. The default is once
every 24 hours and every time the percentage of changes reaches 20%.
In addition, there are three main disadvantages of an A-SIS solution, including:
With A-SIS, de-duplication can only occur within a single
flex-volume (not traditional volume), meaning candidate blocks must be
co-resident within the same volume to be eligible for comparison. The
deduplication is based on 4k fixed blocks, rather than the variable
block of (say) IBM/Diligent. This limits the de-duplication potential.
There is a complicated set of constraints when A-SIS is used
together with different snaps depending on the level of software. Snaps
made before deduplication will overrule de-duplication candidacy in
order to maintain data integrity. This limits the space savings
potential of de-dupe. Specifically, NetApp's de-dupe is not cumulative
to space efficient snapshots. See (technical description);
The performance overheads of deduplication as described above
mean that A-SIS should not be applied to a highly utilized controller
(where the most benefit is likely to be achieved);
There is an overhead of for the metadata (up to 6%)
To exploit this feature, users are locked-in to NetApp storage.
IT Managers should note that A-SIS is included as a no-charge
standard offering within NetApp's Nearline component of ONTAP, the
company's storage OS.
Host-managed offline data compression solutions
is an example of a host-managed data reduction offering or what it
calls 'split-path.' It consists of an offline process that reads files
through an appliance, compresses those files and writes them back to
disk. When a file is requested, another appliance re-hydrates data and
delivers it to the application. The advantage of this approach is much
higher levels of compression because the process is offline and uses
many more robust algorithms. A reasonable planning assumption of
reduction ratios will range from 3-6:1 and sometimes higher for initial
ingestion and read-only Web environments. However, because of the need
to re-hydrate when new data is written, classical production
environments may see lower ratios.
In the case of Ocarina, the company has developed proprietary
algorithms that can improve reduction ratios on many existing file types
(e.g. jpeg, pdf, mpeg, etc), which is unique in the industry.
The main drawbacks of host-managed data reduction solutions are:
The expense of the solution is not insignificant due to
appliance and server costs needed to perform compression. In
infrequently accessed, read-only or write-light environments, these
costs will be justified.
To achieve these benefits, all files must be ingested, which is
a slow process. Picking the right use cases will minimize this issue.
After a file is read and modified, it is written back to disk
as uncompressed. To achieve savings, files must be re-compressed again
limiting use cases to infrequently accessed files.
Ocarina currently supports only files, unlike NetApp A-SIS
which supports both file and block-based storage. However Ocarina's
implementation offers several advantages over A-SIS (remember A-SIS is
The solution is not highly scalable because the processes related to backup, re-hydration, and data movement are complicated.
On balance, solutions such as Ocarina are highly suitable and
cost-effective for infrequently accessed data and read-intensive
applications. High update environments should be avoided.
In-line data compression
IBM Real-time Compression offers in-line data compression whereby a device sits between servers and the storage network (see Shopzilla's architecture). Wikibon members indicate a compression ratio of 1.5-2:1 is a reasonable rule-of-thumb.
The main advantage of the IBM Real-time Compression approach is
very low latency (i.e. microseconds) and improved performance. Storage
performance is improved because compression occurs before data hits the
storage network. As a result, all data in the storage network is
compressed, meaning less data is sent through the SAN, cache, internal
array, and disk devices, minimizing resource requirements and backup
windows by 40% or more, according to Wikibon estimates.
There are two main drawbacks of the IBM Real-time Compression approach, including:
Costs of appliances and network re-design to exploit the compression devices. The Wikibon community estimates clear ROI will be realized in shops with greater than 30TB's;
Complexity of recovery, specifically users need to plan for
re-hydration of data when performing recovery of backed up files (i.e.
they need to have a Storewize engine or software present to recover from
a data loss).
On balance, the advantages of an Ocarina or IBM Real-time Compression
approach are they can be applied to any file-based storage (i.e.
heterogeneous devices). NetApp and other array-based solutions lock
customers into a particular storage vendor but have certain advantages
as well. For example, they are simpler to implement because they are
An Ocarina approach is best applied in read-intensive
environments where it will achieve better reduction ratios due to its
post-process/batch ingestion methodology. IBM Real-time Compression will
achieve the highest levels of compression and ROI in general purpose
enterprise data centers of 30TB's or greater.
Action Item: On-line data reduction is rapidly coming to
mainstream storage devices in your neighborhood. Storage executives
should familiarize themselves with the various technologies in this
space and demand that storage vendors apply capacity optimization
techniques to control storage costs.