Data structures: Make the right choice

Choose your data structures carefully or suffer the consequences

Selecting the most appropriate data structure to store your application's data is important. Your choice of data structure affects the operation and performance of your application -- sometimes with little consequence, sometimes dramatically. Examine a real-world problem that affected an application server product, the diagnosis, and the resolution that effectively improved performance twofold.

Nick Williams (nwilliams942@yahoo.co.uk), Developer

Nick Williams is a developer at a leading telecommunications company. You can reach him at nwilliams942@yahoo.co.uk



14 September 2004

Tracing the problem

The performance problem affected a new release of a server product that is responsible for processing media sessions. The automated testing system used by the engineering department recorded a 50 percent drop in performance compared to the previous release. Fortunately, the department employed processes that allowed them to clearly determine all changes to the code base. And the majority of the changes were all, quite ironically, performance bug fixes. A review of the changes reduced the number of candidates to two and the performance testing was then repeated with each change rolled back. This identified the change that was responsible for the performance hit. Fortunately, it was not the performance fix but a simple modification to the equals method of a class.

Before detailing the problem, it is worth mentioning that the department might have used a profiling tool to identify the source of the performance problem. In fact, this is the only option in situations where performance problems are discovered long after their introduction. The benefits of a daily performance testing approach becomes obvious.


The problematic equals method

Listing 1 highlights the changes made to the equals method. As you can see, the only difference is the evaluation of an additional String member variable. But is this really responsible for such a large drop in performance?

Listing 1. The equals method
public boolean equals(Object a_Object)
{
    if (a_Object instanceof DialogId)
    {
        final DialogId d = (DialogId) a_Object;
            
        if (m_CallID.equals(d.m_CallID) 
            && compareDestination(m_Destination, d.m_Destination) 
            && compareSource(m_Source, d.m_Source)
            && (m_Version == d.m_Version)
            && (m_Identifier.equals(d.m_Identifier)) // NEW
            )
        {
            return true;
        }
    }
    
    return false;
}

Table 1 shows the structure of the Identifier string being compared, together with an example.

Table 1. Identifier structure
z9hG4bKIP address (Hex)Port (Hex)Time (16-digit Hex)Unique integer
z9hG4bKC1C334A713CE000000FC83FAE3111

Two factors have an impact on performance in this case: the length of the string, and the fact that the first 35 characters are always the same. The longer the string, the more time it takes to perform the equality check, especially if the strings are only different near the end. If you change the structure so the sequence number moves to the front or shorten the string , you gain minor performance benefits. In this case however, those changes did not help.

The modified equals method itself was not responsible for the significant performance degradation. Instead, the way that the equals method was employed that was responsible.


Who uses the equals method?

The DialogController class manages instances of the Dialog class. The DialogController class makes indirect use of the equals method. During normal program flow, Dialog instances are inserted into a provisional data structure and, after being processed, moved to another completed data structure. The DialogController is a singleton and can typically manage over 6,000 concurrent Dialog instances. Listing 2 shows the relevant sections of the DialogController class.

Listing 2. Seven methods for creating a connection
public class DialogController
{
    private final Map m_Provisional = new HashMap();

    private final List m_Completed = new LinkedList();


    public boolean handleProvisional(Message a_Message)
    {
        DialogId d = new Dialog(a_Message);

        synchronized(m_SyncMaps)
        {
            if (m_Completed.containsKey(d))
            {
                ...
            }
            else if(m_Provisional.contains(d))
            {
                return true;
            }
            else
            {
                m_Provisional.add(d);
                return false;
            }
        }
        
        return true;
    }

    public boolean handleCompleted(Message a_Message)
    {
        Dialog d = new Dialog(a_Message);

        synchronized(m_SyncMaps)
        {

            if (m_Provisional.contains(d))
            {
                m_Provisional.remove(d);

                m_Completed.put(d, new Object());
                return true;
            }
        }
 
        return false;
    }
}

It is clear that access to the two data structures is synchronised. In some programs, contention for a lock can become a performance bottleneck, but here, the synchronization was present in the previous release. The more obvious candidate is the use of a LinkedList to hold the provisional Dialog instances, potentially a poor choice considering the number of entries in the list. Typically the list has several thousand entries at runtime. Note also that the completed instances are stored in a HashMap.

The link between the poor performance and the equals method lies in the way the LinkedList implementation performs the contains operation (and also the remove operation). Both involve a traversal of the list nodes and an execution of the Dialog equals method against each list node encountered. In a worst-case scenario, the whole list must be checked. On average, only half the list is checked. With several thousand entries, the cumulative time that the program spends comparing the new String member in the equals method is directly responsible for the performance problem and explains why an innocuous change can have serious consequences.

The fact that all of this activity occurs within a synchronized block is also important. In the previous release, the use of the synchronized blocks to make the program thread safe did not cause a problem. But in the new release, the enhanced equals method clearly highlights this area as a performance bottleneck. If you change the new release so it is no longer thread-safe, you can improve the performance, albeit at the expense of data integrity!


The right choice

In this particular case, a better choice is a hash-based collection, which makes the storage and retrieval of objects much faster. Additionally, Java supports hashing directly in the object model through the hashCode method, which is used by the hash-based collections.

Hash-based collections aim to provide efficient insertion and find operations. To meet these requirements certain concessions are made. A specific ordering of the items in the collection is not required. In addition, the ability to efficiently remove items from the container is not a primary objective.

Now that you've chosen a hash-based collection, it is vital that the hashCode method of the Dialog class generate the correct hash codes. If too many objects share the same hash-code, the collection classes resort to an equals-based evaluation. A good hash function avoids collisions, tends to spread keys evenly in the array, and is quick and easy to compute.

In contrast, the singly linked list is the most basic of all the linked data structures. It is simply a sequence of dynamically allocated objects, each of which refers to its successor in the list. Adding to the head and tail end is quick and inexpensive, and inserting into the middle even more so. This also facilitates sorting. Retrieving list items, however, is potentially expensive, as each node has to be traversed and evaluated.

Using a HashSet rather than a LinkedList to store the provisional Dialogs resulted in a 205 percent increase in performance.

On a final note, you've seen how the provisional Dialogs were stored in a LinkedList. However, I noted earlier the choice of a HashMap for the completed Dialogs. Whether this choice was made because of performance reasons is unclear. More likely it was chosen because of the need to store additional data alongside the Dialog.


In conclusion

A developer can easily become complacent or make a genuine mistake when selecting a data structure. And perhaps the number of provisional Dialogs changed over time and the suitability of a LinkedList was not reconsidered. Regardless of the reason, I've illustrated the dramatic difference between getting data structure right or wrong. Getting it wrong can have a terrible impact on the performance of your applications. Now you can get it right.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=15104
ArticleTitle=Data structures: Make the right choice
publish-date=09142004