Co-author: David Floyer
Tip: ctrl +/- to increase/decrease text size
There has been significant discussion in the industry about
storage optimization and making better use of storage capacity. A number
of storage vendors have successfully marketed data de-duplication for offline/backup applications, reducing the volume of backup data by a factor of 5-15:1, according to Wikibon user input.
Data de-duplication as applied to backup use cases is different
from compression in that compression actually changes the data using
algorithms to create a computational byproduct and write fewer bits.
With de-duplication, data is not changed, rather copies 2-N are deleted
and pointers are inserted to a 'master' instance of the data.
Single-instancing can be thought of as synonymous with de-duplication.
Traditional data de-duplication technologies however are
generally unsuitable for online or primary storage applications because
the overheads associated with the algorithms required to de-duplicate
data will unacceptably elongate response times. As an example, popular
data de-duplication solutions such as those from Data Domain, ProtecTier
(Diligent/IBM), Falconstor and EMC/Avamar are not used for reducing
capacities of online storage.
There are three primary approaches to optimizing online storage,
reducing capacity requirements and improving overall storage
efficiencies. Generally, Wikibon refers to these in the broad category
of on-line or primary data compression, although the industry will often
use terms like de-duplication (e.g. NetApp A-SIS) and single
instancing. These data reduction technologies are illustrated by the
following types of solutions:
- NetApp A-SIS and EMC Celerra which employ either “data de-duplication light” or single-instance technology embedded into the storage array;
- Host-managed offline data reduction solutions such as Ocarina Networks;
- In-line data compression appliances available from IBM Real-time Compression.
Unlike some data reduction solutions for backup, these three approaches use lossless data compression algorithms, meaning mathematically, bits can always be reconstructed.
Each of these approaches has certain benefits and drawbacks. The
obvious benefit is reduced storage costs. However each solution places
another technology layer in the network and increases complexity and
Array-based data reduction
Array-based data reduction technologies such as A-SIS operate
in-line as data is being written to reduce primary storage capacity. The
de-duplication feature of WAFL (NetApp’s Write Anywhere File Layout)
allows the identification of duplicates of a 4K block at write time
(creating a weak 32-bit digital signature of the 4K block, which is then
compared bit-by-bit to ensure that there is no hash collision) and
placed into a signature file in the metadata. The work of identifying
the duplicates is similar to the snap technology and is done in the
background if controller resources are sufficient. The default is once
every 24 hours and every time the percentage of changes reaches 20%.
In addition, there are three main disadvantages of an A-SIS solution, including:
- With A-SIS, de-duplication can only occur within a single
flex-volume (not traditional volume), meaning candidate blocks must be
co-resident within the same volume to be eligible for comparison. The
deduplication is based on 4k fixed blocks, rather than the variable
block of (say) IBM/Diligent. This limits the de-duplication potential.
- There is a complicated set of constraints when A-SIS is used
together with different snaps depending on the level of software. Snaps
made before deduplication will overrule de-duplication candidacy in
order to maintain data integrity. This limits the space savings
potential of de-dupe. Specifically, NetApp's de-dupe is not cumulative
to space efficient snapshots. See (technical description);
- The performance overheads of deduplication as described above
mean that A-SIS should not be applied to a highly utilized controller
(where the most benefit is likely to be achieved);
- There is an overhead of for the metadata (up to 6%)
- To exploit this feature, users are locked-in to NetApp storage.
IT Managers should note that A-SIS is included as a no-charge
standard offering within NetApp's Nearline component of ONTAP, the
company's storage OS.
Host-managed offline data compression solutions
is an example of a host-managed data reduction offering or what it
calls 'split-path.' It consists of an offline process that reads files
through an appliance, compresses those files and writes them back to
disk. When a file is requested, another appliance re-hydrates data and
delivers it to the application. The advantage of this approach is much
higher levels of compression because the process is offline and uses
many more robust algorithms. A reasonable planning assumption of
reduction ratios will range from 3-6:1 and sometimes higher for initial
ingestion and read-only Web environments. However, because of the need
to re-hydrate when new data is written, classical production
environments may see lower ratios.
In the case of Ocarina, the company has developed proprietary
algorithms that can improve reduction ratios on many existing file types
(e.g. jpeg, pdf, mpeg, etc), which is unique in the industry.
The main drawbacks of host-managed data reduction solutions are:
- The expense of the solution is not insignificant due to
appliance and server costs needed to perform compression. In
infrequently accessed, read-only or write-light environments, these
costs will be justified.
- To achieve these benefits, all files must be ingested, which is
a slow process. Picking the right use cases will minimize this issue.
- After a file is read and modified, it is written back to disk
as uncompressed. To achieve savings, files must be re-compressed again
limiting use cases to infrequently accessed files.
- Ocarina currently supports only files, unlike NetApp A-SIS
which supports both file and block-based storage. However Ocarina's
implementation offers several advantages over A-SIS (remember A-SIS is
- The solution is not highly scalable because the processes related to backup, re-hydration, and data movement are complicated.
On balance, solutions such as Ocarina are highly suitable and
cost-effective for infrequently accessed data and read-intensive
applications. High update environments should be avoided.
In-line data compression
IBM Real-time Compression offers in-line data compression whereby a device sits between servers and the storage network (see Shopzilla's architecture). Wikibon members indicate a compression ratio of 1.5-2:1 is a reasonable rule-of-thumb.
The main advantage of the IBM Real-time Compression approach is
very low latency (i.e. microseconds) and improved performance. Storage
performance is improved because compression occurs before data hits the
storage network. As a result, all data in the storage network is
compressed, meaning less data is sent through the SAN, cache, internal
array, and disk devices, minimizing resource requirements and backup
windows by 40% or more, according to Wikibon estimates.
There are two main drawbacks of the IBM Real-time Compression approach, including:
- Costs of appliances and network re-design to exploit the compression devices. The Wikibon community estimates clear ROI will be realized in shops with greater than 30TB's;
- Complexity of recovery, specifically users need to plan for
re-hydration of data when performing recovery of backed up files (i.e.
they need to have a Storewize engine or software present to recover from
a data loss).
On balance, the advantages of an Ocarina or IBM Real-time Compression
approach are they can be applied to any file-based storage (i.e.
heterogeneous devices). NetApp and other array-based solutions lock
customers into a particular storage vendor but have certain advantages
as well. For example, they are simpler to implement because they are
An Ocarina approach is best applied in read-intensive
environments where it will achieve better reduction ratios due to its
post-process/batch ingestion methodology. IBM Real-time Compression will
achieve the highest levels of compression and ROI in general purpose
enterprise data centers of 30TB's or greater.
Action Item: On-line data reduction is rapidly coming to
mainstream storage devices in your neighborhood. Storage executives
should familiarize themselves with the various technologies in this
space and demand that storage vendors apply capacity optimization
techniques to control storage costs.
Footnotes: RELATED RESEARCH
Two solid days at VMworld 2011 and I got to do and see a lot. Here is a breakdown of the top 5 things I saw at VMworld.
#1 The SiliconAngle / Wikibon Cube
You couldn’t miss it. You walk into the show floor and there they
were, larger than life. The SiliconAngle / Wikibon Cube broadcasting
live from VMworld2011. Guests that were on the cube included, Tom
Georgens (NTAP), Pat Gelsinger (EMC), David Scott (HP), Rick Jackson
(VMware) as well as many more. The Cube also had 12 Industry
Spotlights. The most interesting spotlight had to do with Storage
Optimization, especially for VMware.
Oh the times they are a changing. Now that you can deliver HD TV
live over the internet, the Cube has broadcast from a number industry
shows and user conferences. The great part about this, it is like the
ability to watch a sporting event being covered by ESPN but for tech.
The Cube brings all of the highlights of these events right into your
computer screen. Now if you can’t make an event, no problem, you can
catch all the most important messages from the Cube. The Cube is now
the new mechanism for delivering content to users in the way they want
to receive the content, TV. For more, check out www.siliconangle.tv
#2 Storage Optimization – Industry Spotlight
In the Storage Optimization industry spotlight, the first 15 minutes
Dave Vellante and his co-host John Furrier tee up the concept. They
discussed storage optimization, where it has come and were it is going,
especially in VMware environments. We are hearing more and more about
storage efficiency technologies. During the next 15 minutes Dave and I
discussed the 5 essential storage efficiency technologies including:
- Thin Provisioning
We also discussed the fact that the IBM Real-time Compression
technology is not only the most efficient and effective compression
technology in the industry; we also learned that IBM really acquired not
just a real-time “compression” technology but a platform that can do a
number of things in real time. In fact, the 5 IBM storage efficiency technologies all operate in real time which is the most effective for customers.
We have been hearing a great deal about storage optimization in a
VMware environment due to the fact that virtualizing servers was
successful for the server side of the house but it didn’t do all it set
out to do, it didn’t fix the overall IT budget.
Virtualizing servers only pushed the financial problem to the storage
side of the house. Users have told us that when they virtualize their
servers, storage grows as much as 4x. By leveraging the right storage
optimization technologies together, users can get their budgets back
under control and also deliver the promise that server virtualization
set out to do.
#3 More Free Time for “Real-life”
While on the Cube as a panelist with my good friend Marc Farley
(HPsisyphus, formally @3ParFarley) Dave asked us what was the most
interesting thing we saw on the show floor while walking around. I
didn’t hesitate in my response. There were two in my mind. First, it
couldn’t be any more obvious at how fast data is growing. Over 50% of
the 19,000 people there had cameras taking pictures and taking video.
That data is going to be stored somewhere. Additionally, they had these
cameras for a reason. Either we have more bloggers and tweeters than
we know about, more marketing people are going to these events or more
people are using social media to inform and educate others. The way in
which users want to receive data is always changing and evolving, and at
least at VMworld 2011 we were delivering content in a number of ways
especially photos and video. All that data will end up in the “cloud”
The second thing I noticed was the amount of free time VMware has
given back to the IT user. I heard, on more than one occasion, end
users talking about family, vacations and travel instead of the usual
banter about how challenging their jobs are and the issues they have
with their vendors which is the normal think I hear at these shows.
This was not an anomaly. I am chalking it up to the fact that VMware
makes people’s lives easier.
#4 Proximal Data
These “most interesting things” are not in any particular order. I say this because I believe that Proximal Data is THE
most interesting thing I saw at the show. Now Proximal Data just came
out of “stealth” in early August. They didn’t have a booth at VMworld
but they did have a “whisper suite”. So, I have to confess, since I
used to be an analyst, sometimes people will ask me to come take a look
at their technology and their message to see if it is in line with what
is going on in the industry so I got to hear the pitch.
Proximal Data’s message is right on. It hits a very important and
growing topic with VMware these days, the I/O bottle neck on virtual
servers, and they solve this problem in a very unique and intelligent
First, the problem. One of the issues facing VMware today is the
number of virtual machines that can be hosted by one physical machine.
The more users can get on one system, the more efficient they can be.
The problem is, today systems are running into I/O workload bottlenecks
that are causing a limitation in the number of virtual machines one
system can run.
One way to solve this problem is add more memory to the host but that
could be very very expensive. You can add more HBA’s or NIC cards but
that can be expensive and also difficult to manage. You can add more
flash cache to your storage to improve the I/O bottleneck but doing that
only solves ½ the problem, you still need to solve the challenge on the
host side, again with memory or host adaptors.
The solution: Proximal Data. With some advanced I/O management
software capabilities combined with PCI flash cards on the host, for a
very reasonable price per host. The software combined with the card is
100% transparent to both the virtual servers and to the storage, which
to me is one of the most important features of the implementation.
Transparency is the key to any new technology. IT has a ton of
challenges and has done a great deal of work to get their environment to
where it is today. To implement a technology that causes all of that
work to be undone is very painful. Remember, the hardest thing to
change in IT is process, not technology. It’s important to preserve the
process. That is what Proximal Data does. Proximal Data can increase
the I/O capability of a VMware server with just a 5 minute installation
of the PCI card and their software. This technology can double and even
triple the number of virtual machines on any physical server and that
is a tremendous ROI. A new win for efficiency.
There are a number of folks entering this market these days; however
Proximal does it transparently with no agents making it the most user
friendly implementation. While these guys won’t have product until
2012, when it hits the market, I am sure it will be very successful.
#5 Convergence to the Cloud
Are we seeing the coming of the “God Box”? A number of vendors are
talking more and more as well as investing in public / private cloud.
There are more systems popping up that have servers, networks, high
availability and storage all in one floor tile. These systems are
designed to integrate, scale, manage VM’s simply, increase productivity
and ease the management of all possible application deployments in any
business. Additionally these boxes help you to connect to the cloud to
ease the cost burden. Is the pendulum swinging back to the “open
systems” main frame? Only time will tell.
One more for fun. The first meeting I had at VMworld was with a
potential OEM prospect of the IBM Real-time Compression IP. I have
always said that this technology could revolutionize the data storage
business much like VxVM did for Veritas many years ago. Creating a
standard way to do compression across a number of system can help users
with implementation as well as ease the storage cost burden. I hope
this moves forward and I hope more folks step up who want to OEM the
by Steve Kenniston The first city on my Eastern European trip was Moscow. I think the
traffic here is worse than the 101 in Silicon Valley during the dot com
era. That said, it was a great visit. I spoke at the Information
Infrastructure Conference at the Swissotel convention center in Moscow.
It was the first time I spoke to a group of people with an
interpreter. It was like being at the UN. The two main topics were
Storage Efficiency and Real-time Compression.
I spoke with a few customers and the press and in dealing with the
data growth challenges they wanted to know, “When it comes to big data,
what is next, is it ‘huge data’”? Data growth clearly a concern.
Interesting enough though most of the questions, came around my title of
“Evangelist”. One report told me, “if an Evangelist is ‘preaching the
word of storage’ then why not just call yourself an Apostle”? How do
you think that would look on an IBM business card: Global Storage
The next day I did a day of “sales enablement” in the Moscow office.
We discussed mostly how to sell and position Real-time Compression and
what is next for the technology. I was very impressed with the team.
They were very technical and knew quite a bit about Real-time
Compression and really wanted to know in more detail how the technology
was invented. This means they are really talking about the technology
and the customers are drilling down into the next level of detail.
There are a lot of good opportunities for the technology in Moscow and I
look forward to hearing more about the success of Real-time Compression
I didn’t have a lot of time to sight see but I did make it to Red
Square. You can actually buy a beer outside in Red Square and walk
around. So I did. I took a few photos and then as the US was getting
going, I had some work calls to attend to. That evening I spent on the
34th floor of my hotel having dinner. It was a great view of Moscow. I hope to come back.
Next stop, Warsaw Poland. Stay tuned.
by Steve Kenniston After landing in Warsaw, I got into a car with the local sales leader
for Poland and we drove to the event location. It was a 2 hour drive.
First, the roads and the land in Poland reminded me very much of my
home time in Maine. Very scenic and rural but beautiful and peaceful.
We talked storage for 2 hours and I am always festinated at the thirst
for knowledge there is when I travel. It was a great ride followed up
by a customer reception and some local Polish brew.
Thursday I spent the day in Sterdyn, Poland for IBM Storage
University. There were 30 customers at the event and it went very very
well. The event was at Palac Ossolinski,
today used as an event center but has a very rich history, in fact at
one point it was used as a medical facility in WWII. The photo is of
the building where we had the event. The topics we covered were:
- Storage Efficiency
- Real-time Compression
The customers were very interactive and provided a lot of insight to
their environments. Interestingly enough I learned during our customer
reception that IBM storage is #1 in Poland with HP second and EMC
third. This is a true testament to the IBM sellers and the customers
who use the IBM products every day to drive their business. I also
learned that the data break down in Poland is 90% block, 10% file which I
found interesting and would be interested to check back 12 months from
today to see how it will be different.
I did learn something very interesting in Poland. The question was
asked “Why XIV”? What is so special about XIV. The answer was
awesome. The answer started with 2 questions:
1) How old is RAID?
2) How old is your iPhone?
The reality is data growth is out pacing what traditional RAID can
handle and data profiles are changing as well. These combined have
driven new technologies like Cleversafe, Cloud Computing, Hadoop and
XIV. Just like the iPhone is a new approach to the smart phone based on
new things we know about how these smart phones are being used, we know
more about how data and storage is being used. New ways to deliver
capacity and performance are needed in order to keep up with the
changing times. I thought it was a very good answer in terms that make
Thursday evening I traveled back to Warsaw where I got in a bit late
and just went to a local pub, Sketch. Grabbed a small bite and some
local mead and then headed back to the hotel. I did get to see the
local Palace of Culture and Science in the middle of Warsaw, very
impressive, built as a gift from Russia to Poland.
I have an early flight to Prague. I am very excited about this part
of the journey as I have always wanted to travel to Prague. Press
meeting right when I land. Stay tuned.
by Steve Kenniston
Alright, landed safe in Prague and was picked up by one of my
colleagues and whisked away to the IBM office. There we did an
interview with Czech writer Martin Noska from Computerworld for IDG in
Czech Republic. The first Noska informed me was that IBM is the number
one in storage sales in Czech Republic (just like Poland!). He also had
some very good questions and he with “What are IBM’s biggest challenges
in the storage business”? I had thought about this for a while and I
would have to say it is really about marketing our storage “solutions”
to the customer base. IBM is a double edge sword. IBM is so big and
has so many products it becomes difficult to market or message all of
our products without inundating all of our customers and confusing
them. If you think about it, IBM has hundreds of thousands of customers
and business partners, if not more. This is one of our strengths.
When customers have needs or requirements we have very good input into
our product portfolio, perhaps the best in the business. Combine this
with the fact that IBM has not only storage solutions but technology
across the entire stack from servers to networking. So when it comes to
developing the right technology, that solves real customer problems, I
would argue that IBM’s portfolio is the best in the business. IBM takes
an extreme amount of care when developing a solution to ensure that it
matches the customer requirements based on the changing needs of IT.
Having an integrated portfolio that works well with our ISV partners,
VMware for example, allows us to help customers speed their time to ROI
and be very competitive in the market place. The challenge is, how do
we properly message our new solutions to our customers, in a timely
manner so that they are well aware of new products without giving them
too much information such that it just becomes noise? It is difficult
to say the least.
The interview went very well. There were questions about tape, where
we discussed the advantages of IBM’s LFTS technology for more advanced
tape usage, we discussed the direction data deduplication will go as
well. Noska’s view was that there hadn’t been any advancement in data
deduplication in the last 5 years. I told him that for secondary
storage, backup, that he is right, I also told him that the real
advancement to deduplication will come when it is ready for primary
storage. Today deduplication isn’t ready for primary, but it will be
On Monday the 13th we traveled to visit Avnet. They are a
great IBM partner. Like most partners they have a very large SMB
install base and also like a lot of SMB feedback I have been getting,
they are looking for a building block solution that has all of the
software features implemented as a part of the stack. SMB and
Enterprise alike are starting to realize that the value in any array is
becoming the software stack that makes the hardware, efficient,
optimized, flexible, and dynamic. IT’s job continues to get more and
more challenging with developing strategic initiatives for the business
to make them more competitive and it is the job of the vendor to make
sure these solutions are as optimized and cost effective as possible.
We also visited DHL. These guys have one of the greatest datacenters
I have ever visited. They are very advanced and push a lot of data.
The do some very strategic logistics for a number of companies in Europe
and Asia. They, like many others have a number of challenges. Since
my blog post about “The 5 Most Interesting things at VMworld”
(#4) I heard something very interesting today. I asked “What is your
most challenging storage issue”? He told me that storage was not is
“most difficult” challenge. Storage efficiency was important to him in
order to keep driving down costs for his organization as they deliver a
service to the different groups that make up DHL, but his most difficult
challenge was with server I/O in his VMware environment. If you read
#4 in my post, regarding Proximal Data, this is exactly the issue the
address. As VM instances grow on the physical servers, the I/O starts
to become the big problem. DHL runs over 4000 instances of VMware and
as the business demands more applications and application resources,
they are bound by the I/O of the server, which also causes them to WAY
over provision their storage for performance reasons. This is very time
consuming, management intensive and expensive. The combination of a
solution like Proximal Data as well as compression can help them
optimize their infrastructure to save money and deliver better, more
cost effective services to their lines of business.
On the lighter side, I spend the weekend in Prague. What an amazing
city. The weather was fantastic and I was able to take a lot of great
photos. I walked around Prague Castle, ate some authentic Czech food,
visited the memorial for the Czech hockey players that passed in the
Russian plane crash and met some pretty interesting people. You can
check out some of my photos of Prague at www.facebook.com/skenniston.
Coincidentally the photo above shows the "Golden Lane" where the
Alchemists worked to turn anything they could find into gold in the city
History truly does repeat itself. We are talking about the history of
data storage. Every once and a while a new technology comes along that
requires a new way to think about infrastructure. Notice I said
“infrastructure”. I’d like to paint two analogies:
1: RAID – Prior to RAID users stored their data on disk and if they
could afford it, they backed that data up to have a protected copy of
their data. When RAID came out, users were able to store their data on
multiple disks appearing as one device. The benefits to this were,
increased data reliability, better performance. This new technology
however, fundamentally changed how disk was sold, but the questions were
- How much capacity do you need?
- What type of performance does your application require?
sales reps point of view changed. There were a number of new
considerations that needed to be taken into account. First, the age old
question, “Will I sell less storage “stuff?” Remember the person, at
the time, selling the disk was probably also selling the backup tape and
software to protect that information. If the disks are more reliable,
maybe the customer won’t need as much tape? Second, when the capacity
question came up, the seller also needed to know what type of RAID the
customer wanted to ensure they sold them enough drives. It was no
longer as simple as asking the capacity requirements and dividing it by
the drive capacity at the time. Now depending upon RAID levels there
was a new set of math that needed to be done. Third was the notion of
performance and more spindles meant more performance so now that the
capacity equation was solved for, you also needed to know the I/O
requirements in order to make sure the right number of drives were sold
to solve for the capacity as well as the performance.
what, we figured it out and the industry never looked back. RAID is a
defacto standard in all storage subsystems today, I even run RAID in my
home. The business benefits of having RAID far outweighed the costs.
In fact, it is probably one of the first times in storage history that
the question of, “how can you afford not to have it”, came up.
2: Virtual Machines – When VMware came out the value proposition was,
do more work, with less physical infrastructure. And again, the
business benefits far outweighed the technology hurdle of implementing
the new solution.
in mind that it is much harder to change process in IT than it is to
change technology, IT decided that this new way of serving up processing
power to applications was well worth all of the process changes that it
would require. One example, backup would need to change when
implementing virtual server technology. The data would grow 4x and the
processing of that information for backup would take longer, in a world
where time was all to valuable. However the business benefit justified
Again, the sellers questions were consistent:
- How many virtual servers do you need? (Capacity)
- What type of performance do you need for each virtual server?
answers to these questions allowed a sales rep to configure the right
number of physical systems to handle the right number of systems to make
the line of business successful. Additionally, some of the same
considerations came up. “Will I sell less server and make less money?”
Now that there was new server technology (more processors, the ability
to handle more memory) systems could be bigger, and more expensive.
Sellers also needed to know a bit more about “capacity”, how many
virtual systems could a physical system run successfully? They also
needed to have an understanding of performance. Now sellers were
configuring systems to run the equivalent of 20 to 100 servers on one
Today I would suggest that we are at a cross roads in history. New technology has come along that will have a significant impact
on the storage world. First, research from IBM reflects the fact that
disk drives can no longer keep getting two times as dense for half the
cost as they had been throughout the late 90’s and early 2000’s. The
technology doesn’t exist today to make the drives spin faster, stay cool
and not loose data. Until now. Real-time compression is
a game changing technology that will add significant value to the
storage industry without having to change the way IT thinks about the
deployment of their storage.
is growing at such a significant pace today and with the latest IBM
research about disk capacities, something needs to change. Data centers
are just running out of space and more customers want to keep more data
on line for reasons such as competitive edge or compliance, but no
matter the reason, they want access to their information. Enter
real-time compression. Now there is a fundamental difference between
real-time compression and other compression technologies and compression
implementations but I am not going get into it here, but it is safe to
say that post process and in-line compression are very different than
real-time compression and users can’t get the benefits of improved
primary storage capacity, transparently, with no performance impact with
anything but real-time compression technology.
real-time compression, like other game changing technology, doesn’t
require any new questions; there are just simply a new set of math
- How much capacity is required?
- What is the performance requirement?
time, real-time compression will be as ubiquitous as RAID, and just
like users don’t think that much about RAID, users won’t need to think
about compression. Compression will become an expected feature of the
array. It doesn’t matter that it now takes fewer drives to satisfy the
original question around capacity and performance. With data growing as
fast as it is and with disks not being able to keep up their growth
pace, something needs to change and that something is real-time
compression. Soon, it won’t matter what the physical disk capacity is
of a disk drive, it will be about a disks virtual disk capacity, what it
has the capability of storing that matters. It is time we all started
thinking this way.
“Storage Efficiency” has become a big topic over the past 12 months. There are a number of new technologies that have come out in the last few years that are helping to deal with storage growth. We all know that data is the root of the decisions that drive business today. The more data you have, hopefully, the better decisions you can make to drive your business to success. The question is, “what is the value (and hence the cost) of the infrastructure to create that success?” What we do know is that the ability to put more data in a highly efficient footprint can give your company a competitive edge. There are five technologies that can help an IT organization create an efficient storage infrastructure. These are:
3) Thin Provisioning
It is also important to point out that there are some semantics when talking about storage efficiency, specifically between efficiency and optimization technologies. I think it is useful to attempt to define these as they lead us to picking the right solutions for what we are trying to accomplish. For the purpose of this post, efficiency will relate to making existing capacity more useful and optimization will mean making more capacity out of existing capacity.
Using these definitions, technologies such as Tiering, Virtualization and Thin Provisioning are efficiency technologies. These technologies help to utilize the existing capacity that you have.
Tiering is technology that is used on about 10% of your data or less. It is used to move data that requires higher performance to flash storage. Good tiering technology analyzes data access patterns and moves the most active data to the highest performing disk. It doesn’t really change the amount of physical capacity that is required; it just changes what type of capacity is required and allows IT to make sure data is operating as fast and efficiently as possible.
Virtualization technology allows IT to make sure disk utilization is used as efficiently as possible. Until recently storage utilization rates were around 50%. By leveraging virtualization technology, IT can group pools of storage so they don’t need to purchase capacity needlessly. Virtualization can be used on 50% to 60% of your storage but it doesn’t change your physical capacity infrastructure requirements and at most allows users to take advantage of 20% to 40% of their capacity that they once didn’t access.
Similar to Virtualization technology, thin provisioning technology also can be used on 50% to 60% of your capacity however, thin provisioning technology gives IT about 10% to 40% of their capacity back. Thin Provisioning helps IT manage their existing capacity and their utilization by being able to make capacity available to users much easier again however it doesn’t change the amount of physical storage infrastructure required.
Optimization technologies help IT to better manage their physical storage footprint. Optimization technologies optimize existing infrastructure by allowing users to put more capacity in the physical same space. The two technologies that are currently used today are data deduplication and real-time compression.
Optimization technologies are a bit tricky. There is a balance that is required between optimization and performance and availability. At the end of the day, IT chooses the storage it buys with two very important characteristics in mind, performance and availability. Optimization technologies can not affect these characteristics. It is for this reason that data deduplication really isn’t ready for “prime time” on primary, active storage. Data deduplication creates too much of a performance impact on primary, active data. Today, data deduplication could be used on about 10% to 15% of the primary, less active capacity that is in the data center and only provides about 30% to 50% overall optimization. In other words deduplication technology can impact the physical infrastructure by as much as 10%, meaning IT may not need to buy as much physical capacity.
Real-time compression, on the other hand, has one of the most dramatic affects on primary storage capacity. Real-time compression can be used on as much as 85% of the storage footprint and can compress data between 50% and 80%. That said Real-time compression could have IT purchase as much as 70% less overall storage capacity. Real-time compression also does not affect the main characteristics for which users buy storage (performance and availability). IT could have as much as 70% less footprint but keep the same amount of data or more on-line. Additionally, IT can now purchase storage opportunistically without having to have such a dramatic impact on their infrastructure, process or budgets. This allows companies to keep more capacity on line and available to help companies do more analytics on more capacity and become more competitive.
When deciding which storage efficiency technology will have a more effective impact on your overall environment and budget, start with optimization technologies and start to get the data growth under control. Adding value to the line of business that can drive revenue with more data will make you a hero and your business more successful.
New SONAS release offers enhanced performance
Businesses continue to search for storage solutions that save money
without sacrificing performance. Last year, IBM introduced Scale Out
Network Attached Storage (SONAS), the industry’s first such
network-attached storage (NAS) offering to address this business need.
SONAS is an enterprise class, NAS system that provides extreme
scalability, availability and security—and does so with record-breaking
performance. It’s designed as a single global repository to manage
multiple petabytes of storage and billions of files all under one file
In April, IBM announced significant performance enhancements to
SONAS: improved information lifecycle management (ILM), hierarchical
storage management (HSM) as well as ease of deployment and antivirus
Todd Neville, SONAS program leader at IBM, says SONAS is unique in
that it can very near-linearly scale to almost any performance level.
With SONAS, he says, “You can build a system that’s as fast as you want
it to be; but it’s not just about absolute size, it’s also about bang
for your buck. We’ve significantly increased the software performance in
our upcoming release 1.2, so customers see a significant performance
increase on their current platform with no additional costs.”
Funda Eceral, SONAS market segment manager at IBM, says SONAS is the
only true scale-out NAS system available in the marketplace. “While you
can nondisruptively add capacity with storage building blocks,” Eceral
says, “you can also still continue to independently scale out your I/O
performance with interface nodes. It brings operational efficiency and
extraordinary utilization rates for each customer.”
Three Key Features
This version of SONAS offers three key features, according to Neville:
- Ease of deployment. Using Network Data Management Protocol
(NDMP), a SONAS device can be easily integrated into existing
data-center backup infrastructures. “If you have an enterprise backup
deployment using NDMP, you will be able to take SONAS and quickly
connect with a wide variety of popular backup systems,” Neville says.
- Built-in antivirus integration. Scalable NAS storage devices
must have a way for an antivirus function to perform scans on files
intelligently, such as when they’re opened or closed. SONAS includes a
built-in functionality that lets a third party like Symantec integrate
into the SONAS device to perform antivirus operations, as simple “full
file-system scans” become cumbersome at enterprise scales.
- Physical size. Neville says customers asked IBM to make the
SONAS device more compact, although it supports almost a full petabyte
in a single rack, making it the only offering in IBM’s NAS portfolio
that can do so. It’s now 10 inches shorter than the original device, can
scale up to 14.4 petabytes (with 2 TB drives) and has a single point of
management, which can significantly reduce storage-administration
“Everyone says, ‘We do tiering, HSM and ILM,’ but design
matters—IBM does it differently.” —Todd Neville, SONAS program leader,
“Everyone says, ‘We do tiering, HSM and ILM,’ but design matters—IBM does it differently.” —Todd Neville, SONAS program leader, IBMNext Page >>