Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Solid-state drives: Changing the data world

Say hello to your new friend

Paul Pendle, Consulting Systems Integration Engineer, EMC Corporation
Paul Pendle photo
Paul Pendle is a Consulting Systems Integration Engineer with the EMC Corporation in Massachusetts. He is a 31 year veteran of the IT industry with a variety of experience in systems programming and database administration. Paul has worked with DB2 on z/OS (MVS) since DB2 V1.2 and speaks regularly at DB2 user conferences on both DB2 for z/OS and DB2 for Linux, UNIX and Windows. Paul is a published author on integration of EMC products with DB2.

Summary:  Solid state drives represent a quantum leap in enterprise storage performance over hard disk drives and Fibre Channel drives. With these new drives, you may need to reevaluate some of the ways that you deal with databases, as the move from spinning disks to solid-state technology will change the rules. In this article, learn about some of the new considerations. This content is part of the IBM Data Management magazine.

Date:  01 Aug 2011
Level:  Introductory
Also available in:   Chinese  Korean

Activity:  3641 views
Comments:  

- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine

It cannot have escaped your notice, if you have been reading industry articles and press releases, that solid-state drives (SSDs) are now available for the leading enterprise storage arrays. Disk storage companies are pitching these drives as a quantum leap in enterprise storage performance, and they are. The sales literature primarily demonstrates SSD performance advantages over hard disk drives (HDDs) but also includes information about additional benefits, such as power and cooling savings, reliability, cost per I/O, and so on.

Even if you strip away the marketing hype, the numerous performance advantages make a strong argument that SSDs will completely replace Fibre Channel (FC) drives as the primary storage technology in high-end storage systems. Price is likely to be an issue for a while, but ultimately, the end of FC drives is in sight. (Who remembers the arrival of the CD? Although it was the first nail in the coffin of vinyl, vinyl indeed took many years to leave the mainstream.) The end of FC drives might seem like a very ordinary technology transition, but something very out of the ordinary is occurring. It’s possible that SSDs will cause you to reevaluate some of the ways that you deal with databases, as the move from spinning disks to solid-state technology will change the rules.

I/O cost and the slowly spinning disk

Clustering. Defragmenting. Reorganizing. Buffering. A vast number of common database tasks and strategies are designed to do one thing only: minimize disk I/O. That’s because HDD I/O is usually the most time-consuming part of any database transaction. If we put it in terms of total time costs, HDD I/O is fantastically expensive.

When you move to SSDs, I/O operations become much faster, but most people don’t realize how large the difference is. For example, take a single, random 4K-page read. From a 15,000 rpm FC HDD, the average response time is approximately 6.5 ms. When using an SSD in an enterprise-class storage array, you should estimate the same read would take 1 ms. In other words, you might expect an SSD to complete six I/Os in the time it takes an FC HDD to finish one.

But there’s more to the story. SSDs can perform operations concurrently; HDDs cannot. The best you can reasonably expect from an FC 15,000 rpm HDD is about 200 random 4K-page reads per second. Using SSDs, read requests can overlap, which gets you more like 5,000 random 4K-page reads per second. A single I/O costs 25 times less on an SSD than on an HDD.

With SSDs, disk I/O doesn’t cost a little less—it costs a lot less. But is the reduction enough to change how you build and manage your databases? In some cases, yes. (Note: This article chiefly addresses online transaction processing [OLTP] access. Sequential access, as exhibited by data warehouse activities, does not produce as great a performance improvement with SSDs as does random access.)


Cluster’s last stand?

Database clustering places frequently accessed DB2 rows in the same page and frequently accessed pages close together on disk. A successful clustering strategy can reduce the number of DB2 GETPAGEs (and also I/Os), especially if many sequential scans use the clustering index. Because the DB2 subsystem reads data in the order in which it needs to process it, clustering helps reduce expensive SORTs. Sequential scans are typical in decision support system (DSS) applications but rare with OLTP access patterns.

The notion of adjacency of pages to reduce disk head or arm movement and thus latency during I/O does not apply to SSDs, of course. When data appears to be physically adjacent in DB2 (that is, consecutive pages in the table space), it is very unlikely to be in consecutive cells on the SSD. The data is distributed evenly across the SSD capacity using wear-leveling algorithms. In fact, the exact location on the SSD is somewhat irrelevant, because the latency to retrieve the data is an order of magnitude (and in some cases, two orders of magnitude) less than the latency incurred during a random read function on a spinning disk.

Does clustering (or for that matter, poor clustering) really matter when using SSDs? Consecutive data pages are unlikely to be in adjacent cells due to the wear-leveling algorithms, so do you need to group data on the media? It’s an interesting question. If DB2 understands that two rows clustered together are on the same page, requests for both of them might result in a single I/O if the requests are close together in time. This single I/O might not occur if the data is unclustered. That is to say, both of those rows may be on separate pages, resulting in two separate GETPAGEs and two separate sync I/Os. That raises the question, is the cost of the extra GETPAGE (or GETPAGEs) and related sync I/Os punitive when using SSDs?

Purists might argue that it costs CPU and channel resources to perform the extra GETPAGEs, and they would be right. However, in the context of reducing the number of REORGs that need to be performed and the speed of the SSDs, this extra cost is easily justified.


Embedding free space

When you create a table space, you frequently embed free space so that an application can insert rows in a clustering sequence, and rows can increase in size as they are updated. The extra space can reduce overflows and index page splits. Most folks typically reserve 10 percent of the total table space as free space, but you will often see more with highly volatile applications.

However, reserving free space only trades disk space for time, potentially letting you go longer between REORGs. So consider embedding less free space when using SSDs, since the clustering sequence may not be so important.

If you decide to forgo free space in the table space, you might want to consider using APPEND YES for the tables in the table space. This option reduces the code path that DB2 must traverse to find a location for the inserted row and also avoids page overflows on INSERTs. On the downside, you need to consider concurrency. Multiple threads executing INSERTs and competing for locks and latches on the same page can be costly, especially in a data sharing environment (although MC00 may solve this problem).


Buffer pools

DBAs use buffer pools to keep the most recently used DB2 pages in memory, hoping that the pages will be reused and thus avoiding I/O. And as they say, the best I/O is the one that does not happen. With SSDs bringing down the performance cost of an I/O, the use of buffer pools is not so critical.

A potential course of action here is to reduce the size of the buffer pools supporting the table spaces resident on SSDs. It is likely that you will have a mixture of HDDs and SSDs, so you can allocate the buffer pool space thus saved to the table spaces on HDDs, which need it more.


REORGs

What does a DBA do today to keep data and indexes organized? Several things, actually, but REORGs may be the most significant activity. REORGs accomplish a number of goals, many of which are related to disk I/O:

  • Place the data in a clustering sequence
  • Recapture lost space due to row deletions
  • Reinsert free space into the table space
  • Reduce leaf levels or pages in indexes
  • Reduce fragmentation in indexes
  • Reduce the number of extents of the table or index space

DB2 does its level best to keep rows close to their optimal location in the clustering sequence. But this is not always possible. The page might lack space for the optimal placement of the row on an INSERT, or a row might grow beyond the space allotted for it on an UPDATE and so must be moved. And there are many other reasons. Ultimately, over time, the CLUSTERRATIO of the data decreases from 100 percent to a lower value. How quickly it decreases depends on the volatility of the data.

You need to monitor the system (ideally in an automated fashion) to determine when a REORG needs to be run. All kinds of DB2 catalog statistics describe the table space condition, and IBM suggests thresholds as to when the REORG should occur. There are many downsides to REORGs:

  • They must be scheduled and monitored.
  • They consume a large amount of I/O and CPU resources.
  • They can reduce concurrency and availability when executed against a live table or index space.
  • They can increase logging during the REORG process.
  • They flood wide area networks (WANs) with changed data traffic when disaster recovery replication is used.

The last point may be the most significant if you are replicating your database over a long distance for disaster recovery. The writes generated by a REORG, while not transactional writes, are still a necessary evil that must be added to the “real” write workload being transmitted across the link. Because the cost of telecommunications lines is so high, you would hate to fill the pipe with such busywork that could be avoided or at least reduced.

Deploying table spaces on SSDs can reduce the need to REORG. This approach can generate a huge savings in management, CPU, I/O, and link bandwidth for remote replication. You just need to understand that the extra work the DB2 subsystem must do to retrieve index or data pages on SSDs is less than the cost of performing I/O on a table space that has been REORGed on HDDs.


IBM trends

Conversations with folks at IBM reveal a viewpoint within the company that DBAs may reduce REORG frequency. This notion is independent of SSD implementations and is more a reexamination of why REORGs are performed in the first place. For instance: does a large number of extents for a table space really have a measurable negative performance impact? Additionally, DSNACCOX with DB2 10 has specific code to reduce the REORG requirements for table spaces residing on SSDs.


Think differently about SSDs and DB2

Using SSDs to support a DB2 for z/OS OLTP subsystem can help you achieve the following:

  • Reduce the number of REORGs
    • Save expensive MIPS, disk, and channel resources
    • Reduce the costs of remote replication
    • Save personnel time managing the REORGs
  • Increase space utilization by embedding less free space
  • Improve buffer pool efficiency by dedicating minimal space to table spaces on SSDs and reallocating the space thus saved to HDD buffer pools

When you consider the purchase of SSDs for your enterprise-class storage array, think beyond speeds and feeds. Think about which tasks SSDs enable you to accomplish differently.

Sponsor Article
Maximize Data Analytics Performance
Partner Resources
Applied Analytix, Inc DBIIBM
IBM Information On DemandInternational DB2 Users Group (IDUG)Melissa Data
NetezzaNiteo PartnersQueBIT
Quest Software

Resources

About the author

Paul Pendle photo

Paul Pendle is a Consulting Systems Integration Engineer with the EMC Corporation in Massachusetts. He is a 31 year veteran of the IT industry with a variety of experience in systems programming and database administration. Paul has worked with DB2 on z/OS (MVS) since DB2 V1.2 and speaks regularly at DB2 user conferences on both DB2 for z/OS and DB2 for Linux, UNIX and Windows. Paul is a published author on integration of EMC products with DB2.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=742713
ArticleTitle=Solid-state drives: Changing the data world
publish-date=08012011
author1-email=dwinfo@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers