Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Data structures: Make the right choice

Choose your data structures carefully or suffer the consequences

Nick Williams is a developer at a leading telecommunications company. You can reach him at nwilliams942@yahoo.co.uk

Summary:  Selecting the most appropriate data structure to store your application's data is important. Your choice of data structure affects the operation and performance of your application -- sometimes with little consequence, sometimes dramatically. Examine a real-world problem that affected an application server product, the diagnosis, and the resolution that effectively improved performance twofold.

Date:  14 Sep 2004
Level:  Introductory

Comments:  

Tracing the problem

The performance problem affected a new release of a server product that is responsible for processing media sessions. The automated testing system used by the engineering department recorded a 50 percent drop in performance compared to the previous release. Fortunately, the department employed processes that allowed them to clearly determine all changes to the code base. And the majority of the changes were all, quite ironically, performance bug fixes. A review of the changes reduced the number of candidates to two and the performance testing was then repeated with each change rolled back. This identified the change that was responsible for the performance hit. Fortunately, it was not the performance fix but a simple modification to the equals method of a class.

Before detailing the problem, it is worth mentioning that the department might have used a profiling tool to identify the source of the performance problem. In fact, this is the only option in situations where performance problems are discovered long after their introduction. The benefits of a daily performance testing approach becomes obvious.


The problematic equals method

Listing 1 highlights the changes made to the equals method. As you can see, the only difference is the evaluation of an additional String member variable. But is this really responsible for such a large drop in performance?


Listing 1. The equals method
public boolean equals(Object a_Object)
{
    if (a_Object instanceof DialogId)
    {
        final DialogId d = (DialogId) a_Object;
            
        if (m_CallID.equals(d.m_CallID) 
            && compareDestination(m_Destination, d.m_Destination) 
            && compareSource(m_Source, d.m_Source)
            && (m_Version == d.m_Version)
            && (m_Identifier.equals(d.m_Identifier)) // NEW
            )
        {
            return true;
        }
    }
    
    return false;
}

Table 1 shows the structure of the Identifier string being compared, together with an example.


Table 1. Identifier structure
z9hG4bKIP address (Hex)Port (Hex)Time (16-digit Hex)Unique integer
z9hG4bKC1C334A713CE000000FC83FAE3111

Two factors have an impact on performance in this case: the length of the string, and the fact that the first 35 characters are always the same. The longer the string, the more time it takes to perform the equality check, especially if the strings are only different near the end. If you change the structure so the sequence number moves to the front or shorten the string , you gain minor performance benefits. In this case however, those changes did not help.

The modified equals method itself was not responsible for the significant performance degradation. Instead, the way that the equals method was employed that was responsible.


Who uses the equals method?

The DialogController class manages instances of the Dialog class. The DialogController class makes indirect use of the equals method. During normal program flow, Dialog instances are inserted into a provisional data structure and, after being processed, moved to another completed data structure. The DialogController is a singleton and can typically manage over 6,000 concurrent Dialog instances. Listing 2 shows the relevant sections of the DialogController class.


Listing 2. Seven methods for creating a connection
public class DialogController
{
    private final Map m_Provisional = new HashMap();

    private final List m_Completed = new LinkedList();


    public boolean handleProvisional(Message a_Message)
    {
        DialogId d = new Dialog(a_Message);

        synchronized(m_SyncMaps)
        {
            if (m_Completed.containsKey(d))
            {
                ...
            }
            else if(m_Provisional.contains(d))
            {
                return true;
            }
            else
            {
                m_Provisional.add(d);
                return false;
            }
        }
        
        return true;
    }

    public boolean handleCompleted(Message a_Message)
    {
        Dialog d = new Dialog(a_Message);

        synchronized(m_SyncMaps)
        {

            if (m_Provisional.contains(d))
            {
                m_Provisional.remove(d);

                m_Completed.put(d, new Object());
                return true;
            }
        }
 
        return false;
    }
}

It is clear that access to the two data structures is synchronised. In some programs, contention for a lock can become a performance bottleneck, but here, the synchronization was present in the previous release. The more obvious candidate is the use of a LinkedList to hold the provisional Dialog instances, potentially a poor choice considering the number of entries in the list. Typically the list has several thousand entries at runtime. Note also that the completed instances are stored in a HashMap.

The link between the poor performance and the equals method lies in the way the LinkedList implementation performs the contains operation (and also the remove operation). Both involve a traversal of the list nodes and an execution of the Dialog equals method against each list node encountered. In a worst-case scenario, the whole list must be checked. On average, only half the list is checked. With several thousand entries, the cumulative time that the program spends comparing the new String member in the equals method is directly responsible for the performance problem and explains why an innocuous change can have serious consequences.

The fact that all of this activity occurs within a synchronized block is also important. In the previous release, the use of the synchronized blocks to make the program thread safe did not cause a problem. But in the new release, the enhanced equals method clearly highlights this area as a performance bottleneck. If you change the new release so it is no longer thread-safe, you can improve the performance, albeit at the expense of data integrity!


The right choice

In this particular case, a better choice is a hash-based collection, which makes the storage and retrieval of objects much faster. Additionally, Java supports hashing directly in the object model through the hashCode method, which is used by the hash-based collections.

Hash-based collections aim to provide efficient insertion and find operations. To meet these requirements certain concessions are made. A specific ordering of the items in the collection is not required. In addition, the ability to efficiently remove items from the container is not a primary objective.

Now that you've chosen a hash-based collection, it is vital that the hashCode method of the Dialog class generate the correct hash codes. If too many objects share the same hash-code, the collection classes resort to an equals-based evaluation. A good hash function avoids collisions, tends to spread keys evenly in the array, and is quick and easy to compute.

In contrast, the singly linked list is the most basic of all the linked data structures. It is simply a sequence of dynamically allocated objects, each of which refers to its successor in the list. Adding to the head and tail end is quick and inexpensive, and inserting into the middle even more so. This also facilitates sorting. Retrieving list items, however, is potentially expensive, as each node has to be traversed and evaluated.

Using a HashSet rather than a LinkedList to store the provisional Dialogs resulted in a 205 percent increase in performance.

On a final note, you've seen how the provisional Dialogs were stored in a LinkedList. However, I noted earlier the choice of a HashMap for the completed Dialogs. Whether this choice was made because of performance reasons is unclear. More likely it was chosen because of the need to store additional data alongside the Dialog.


In conclusion

A developer can easily become complacent or make a genuine mistake when selecting a data structure. And perhaps the number of provisional Dialogs changed over time and the suitability of a LinkedList was not reconsidered. Regardless of the reason, I've illustrated the dramatic difference between getting data structure right or wrong. Getting it wrong can have a terrible impact on the performance of your applications. Now you can get it right.


Resources

About the author

Nick Williams is a developer at a leading telecommunications company. You can reach him at nwilliams942@yahoo.co.uk

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=15104
ArticleTitle=Data structures: Make the right choice
publish-date=09142004
author1-email=nwilliams942@yahoo.co.uk
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).