This article was orignally published in December, 2001 and refers to Rational Suite TestStudio
"The moment you install a Web server at your site, you've opened a window into your local network that the entire Internet can peer through," points out one article on Web site security, "The World Wide Web Security FAQ" by Lincoln D. Stein and John N. Stewart. "Most visitors are content to window shop, but a few will try to peek at things you don't intend for public consumption. Others, not content with looking without touching, will attempt to force the window open and crawl in," warns the article. Because security is an issue for every Web site, any performance testing effort needs to take security matters into consideration. This article addresses how (and why) to simulate various forms of secure (and insecure) authentication and session identification using Rational Software tools.
This is the eleventh article in the "User experience, not metrics" series, which focuses on correlating customer satisfaction with your Web site application's performance as experienced by users. This article begins the final trilogy, which will discuss some advanced topics related to using IBM® Rational Suite® TestStudio® to conduct performance testing. Here's what the series has covered so far:
- Part 1: Introduction
- Part 2: Modeling individual user delays
- Part 3: Modeling individual user patterns
- Part 4: Modeling groups of users
- Part 5: Using timers
- Part 6: Working with outliers
- Part 7: Consolidating test results
- Part 8: Choosing tests and reporting results to meet stakeholder needs
- Part 9: Summarizing results across multiple tests
- Part 10: Creating a degradation curve
This article is intended for intermediate to advanced TestStudio users. A general knowledge of Web security methods and procedures will help you apply the concepts discussed. Some C programming and an ability to use and manipulate regular expressions is a must.
About authentication, cookies, and session tracking
Before we can discuss how to handle authentication and session tracking (and cookies used in either case) in performance test scripts, we first need to understand the basics of what they are, why they're used, and how they work. The sections below are intended to give you enough background so you can understand why it's important to ensure that your performance-testing scripts handle these aspects of Web applications correctly.
Authentication is most commonly thought of as the process of logging in to a computer, Web site, or application. There are many types of authentication, but the basic idea is that the user selects or is assigned credentials (normally a user name and password) that are associated with certain privileges. When the user tries to access the application or site, it asks for those credentials (except where the authentication is cookie based, as discussed below) and allows or prohibits access based on the credentials provided. Authentication is often done through a pop-up window, such as the one shown in Figure 1, but may also be done without a pop-up window, directly on the Web page.
|
| Figure 1: Common authentication pop-up window |
Authentication is particularly important to performance testing. If your scripts don't do authentication properly, they'll be prohibited from performing the action you're trying to test and will therefore provide misleading results. Remember, if a site requires you to log in -- ever -- you need to think about authentication when developing your scripts.
Did you ever notice that when you log in to certain Web sites after your initial visit, you don't have to present any credentials to gain access? That's because the site stored your user name and password on your machine in a little chunk of code known as a cookie. Not all Web sites leave cookies on your computer, but many do, especially the large well-known sites. They use cookies to track what you're viewing and to recognize your computer when you come back to visit again.
It's important to consider this during performance testing. If the site you're testing uses cookie-based authentication, you must take care to accurately script new versus return users. Authentication from user submission will likely be more performance intensive than authentication from a previously stored cookie. See "Persistent Client State HTTP Cookies" for a summary of how cookies work.
Quite often, Web sites that require authentication also use sessions. These sessions are generally assigned at the time authentication is processed and are tracked in one of a number of ways, including hidden fields, client-side or server-side cookies, or URL parameters. If a site whose performance you're testing tracks sessions, it's imperative that your script handle that session tracking correctly.
For our purposes, a session is the time and activity of a single user accessing a single Web site from a single browser without closing the browser or switching to another Web site. For instance, you go to a site that sells books and log in if you're a return visitor or create an account if you're a first-time visitor. This begins your session. You navigate the site, find the book you want, add it to your shopping cart, enter your billing information to purchase the book, and finally close your browser or move on to another Web site. This ends your session.
The important point for us is that from the time you logged in until you logged out (or more likely timed out), the bookstore application was keeping track of your activity. Session tracking is critical to the majority of B2B and B2C e-commerce Web sites. If session tracking isn't handled properly in your performance tests, they'l likely not be performing the activities they were designed to but will simply be forcing unexpected error messages, or worse, associating activities such as orders with the wrong customer.
Handling authentication in your scripts
Since most sessions are tracked based on the results of authentication, we must first ensure that authentication is handled properly in our performance test scripts. The VuC recording and scripting mechanism typically does a very good job of handling authentication automatically. We'll discuss here the proper Rational Robot® configurations to ensure that three common types of authentication -- basic, secure, and automatic (with cookies) -- are handled correctly, and what additional considerations to evaluate.
Basic authentication is handled very cleanly in performance (VuC) scripts. Whether you record using the API, network, or proxy method, basic authentication will be detected (assuming, of course, that you're recording against a supported protocol that you have a license for). The option to change recording methods is found in Robot under Tools > Session Record Options on the Method tab, as shown in Figure 2. If you're not familiar with the three recording methods, please refer to Rational documentation.
|
| Figure 2: Session Record Options, Method tab |
To make sure basic authentication is simulated accurately by your performance test, be sure to do the following:
-
Create datapools with enough valid user names and passwords to keep the same user from logging in multiple times concurrently during your tests -- unless, of course, you're intentionally doing security testing or certain types of functional testing and want to see how your application handles this anomaly.
-
Ensure that all of the user names and passwords in the datapool are valid before executing the performance test.
I recommend recording a simple script that logs in to the home page and then ends. Then configure your suite to execute that script the same number of times as there are values in your datapool and let it run. Theoretically, there should be no errors. If there are errors, check the logs and see which datapool values caused the errors, and then validate them by hand. Then continue recording your scripts and linking them to the validated datapool.
Secure (https://) authentication is just a little
trickier. Secure authentication is also handled well automatically by VuC scripts,
but only when you use the API recording method. The reason for this is that
the API recording method captures traffic before the encryption algorithm is
initiated to convert the cleartext into encrypted form. If you use the network
or proxy method to generate scripts against a site that uses secure authentication,
the credentials appearing in your script will be encrypted and thus you won't
be able to use values from a datapool to simulate different users.
To see the difference, compare the scripts shown in Listings 1 and 2 below. Listing 1 was captured using the network method. (Note that HTTPS traffic can't be captured via network or proxy methods using the HTTP protocol. If you need to record using network or proxy methods against HTTPS sites, you'll have to select the Socket protocol.) As you can see, all of the information is in hexadecimal format and would be more than slightly difficult to decode. Listing 2 is the exact same script with the same credentials (user name and password), recorded using the API method instead.
| |
| Listing 1: HTTPS authentication script (network method) |
| |
| Listing 2: HTTPS authentication script (API method) |
If we were to look at the associated Datapool_Config section of the script shown in Listing 2, we would see the user name and password in cleartext exactly as it was originally typed into the user name and password fields. Thus, with this method of script recording, you can use a datapool just like with basic authentication.
In some very rare cases, recording using the API method will still result in encrypted credentials appearing in your script. In this case, HTTPS authentication wasn't used, but rather some custom encryption method. If this happens to you, you'll have to speak directly to the developer and collaboratively determine how to handle this circumstance. I've only run into this once, and in this case, the developer disabled encryption for performance testing. This isn't the most realistic approach, but it was the only reasonable option available to us. Remember that performance testing with a single, quantifiable, primarily client-side exception is still tremendously better than no performance testing at all.
Automatic authentication using cookies
Often we end up testing sites that are designed to automatically log return users in to the application. This is accomplished most often using client-side cookies. Listing 3 is an example of the first request sent for a site that has a current client-side cookie.
| |
| Listing 3: Cookie authentication script |
The site that I used in this example uses a cleartext cookie. In this case, each of the relevant values can be added into a datapool just like any other variable. To put the cookie values into a datapool, you would manually edit the script to replace the "UserName=myname; Password=mypass\r\n" line, as shown in Listing 4, and add the corresponding datapool values into the Datapool_Config section of the script.
| |
| Listing 4: Cookie authentication script, datapooled |
Of course, if the cookie stores encrypted information, you'll need to determine the encryption algorithm to create your datapool. Encryption/decryption algorithms are beyond the scope of this article.
Handling session tracking in your scripts
Now that you know how to handle authentication, let's turn to session tracking. Session tracking can be accomplished in several ways, including the following common methods:
- storing unique session information in a client-side cookie
- appending a unique session ID to the URL
- passing unique session information in a hidden field
Any of these session-tracking methods can be either static or dynamic. Static methods assign the user a single session ID for the entire session, while dynamic methods assign a new session ID with every activity the user performs. I'll show you how to handle these methods properly in your scripts to ensure that your performance tests are measuring what you mean to measure.
Session tracking with client-side cookies
It seems like most Web sites these days want to leave a cookie on your computer when you visit them -- whether you're authenticated to get into the site or not. Listing 5 is an example of a cookie created by a visit to www.washingtonpost.com.
| |
| Listing 5: Cookie created by the Washington Post site |
To be perfectly honest, I have no idea what those values represent, but if I were testing the site I would need to work with the developers to find out. I would assume that at least one of those numbers somehow corresponds to the time stamp of my last visit to the site and that some content is modified based on that time stamp. Another field is likely to be what's known as the "time to live" -- a number representing how long this cookie is considered to be usable, similar to the "sell by" date on most perishable grocery items.
The cookie in Listing 6 was created by going to the Rational Developer Network. As we know, that site authenticated and gave us the option of having it remember our password.
| |
| Listing 6: Cookie created by the Rational Developer Network™ |
You'll notice that the basic format of the two cookies is very similar. The
biggest difference is the seemingly random string of characters in the second
line of the Rational Developer Network
™
cookie. Buried in that encrypted data are my authentication credentials.
These kinds of cookies are usually modified once every time you access the site. However, it's also possible that each page rewrites to that cookie, and thus really does provide dynamic session tracking. The important thing from a performance-testing perspective is to be aware of the cookies used on any site you test and proceed as follows:
-
If there are values in a cookie that vary from user to user and test to test that would be stored on the computer before your test starts, you'll probably want to incorporate those values into a datapool or some other kind of variable. Remember when executing a script that the script doesn't read the cookies currently on the system but rather just the cookie information in the script.
-
If, on the other hand, a new piece of information is written to a cookie when you first access the site, or possibly with each page, you'll likely need to capture that information dynamically while your script is running, put it into a variable, and then put it in its proper place in the script. I'll demonstrate how this is done later in the "Capturing Dynamic Session Data" section.
Session Tracking with Static Session IDs
Static session IDs are the most common method used to track sessions and are typically appended to URLs or passed in hidden fields. As with cookies, there are a myriad of ways these can be handled, but from a scripting perspective there are three basic approaches:
- Put session IDs into datapools.
- Correlate the variable(s).
- Pull the session ID from the
_response(read "underscore response") file on the fly.
I'll discuss the first two approaches in this section and the third approach in a section of its own, since the method of capturing dynamic data that I'm going to show you can also be applied to session tracking with cookies -- or to capturing any other type of data that needs to be captured dynamically during test execution.
Each of these methods assumes that you can first identify the session ID in your script. Sometimes it's very obvious -- for example:
JSessionID=lotsofrandomlookingcharacters
in the header file, or
?Session=otherrandomcharacters
appended to the URL.
If you know the site tracks sessions and you can't find the session ID, either get help from the person who developed the session-tracking mechanism for the site or, as a last resort, record an identical script several times and see what changes from one script to the next. (I say this is the last resort since just because something changes, that doesn't make it a session ID.)
Datapooling the session IDs may sound like a good idea, but it very rarely is. In fact, I've never had occasion to use this method. The only time I can think of that this would be a good idea is for security testing. However, if you have a list of session IDs that you want to use in your test, you can follow the same thought process as discussed for datapooling user names and passwords in cookies.
Usually, Robot can handle session IDs very simply by correlating variables in the response to subsequent requests. Unless you already know the name of the variable(s) you want to correlate, you'll probably want to start by going to Tools > Session Record Options, clicking the Generator per Protocol tab, and ensuring that the "Correlate variables in response" option is set to All, as shown in Figure 3. Once you find the variable(s) you want to correlate, you may want to identify it or them by setting the "Correlate variables in response" option to Specific, clicking the Add button, and entering the variable name(s).
|
| Figure 3: Session Record Options, Generator per Protocol tab |
A script generated with correlation will have one or more (usually many more) sections that are similar to Listing 7. The sample shows that the SgenRes strings are populated with values on the fly (indicated in boldface) and then plugged into a later request as a variable. This is an extremely powerful feature that works very well as long as Robot is able to identify the variable.
| |
| Listing 7: Sample from a script that correlates variables |
Capturing Dynamic Session Data
Robot is very good at detecting and correlating variables almost all of the time. In a circumstance where it can't detect a dynamic variable of interest, you'll need to capture the variable manually on the fly. Luckily, this isn't as difficult as it sounds.
During playback, Robot keeps all the information that eventually ends up in the log file accessible in the _response file. This file contains all the data received from the server as a result of the requests sent by your script. To see the contents of this file, you can change your record options on the Generator tab to show all recorded data by setting the "Display recorded rows" option in the Return Data section to All, as shown in Figure 4. When your script is generated, everything between #if 0 and #endif is what exists in the _response file during playback.
|
| Figure 4: Session Record Options, Generator tab |
Listing 8 is an example of a script with a session ID that needs to be captured dynamically. In this case, the session ID is being passed as part of the URL and has been recognized by Robot as a datapool item. However, since we have no way of knowing what the session ID is going to be beforehand to put it into a datapool, we need to capture it dynamically.
| |
| Listing 8: Unmodified script with session ID |
To capture the session ID dynamically, we have to follow several steps. First, we need to add to the top of our script the following variables that we'll use in parsing the session ID:
-
str_sessionid, to hold the session ID string once it's parsed from the_responsefile -
int_start, to hold a number indicating the starting position of the session ID for use in parsing it out -
int_end, to hold a number indicating the ending position of the session ID
Listing 9 shows where these variables are declared.
| |
| Listing 9: Variable declarations |
Now, before we can write the code to parse the session ID, we must find the initial occurrence of the value. Listing 10 shows the section of code where the session ID was found in this case.
| |
Listing 10: Location of session ID in _response file
|
Now that we've located the session ID, we need to write an expression to capture it that we'll place before the next http_request in the script. Listing 11 is the expression written to capture the string in this case.
| |
| Listing 11: Section ID parsing code |
It's beyond the scope of this article to explain in detail the methods used to capture a string programmatically, but I'll explain what's happening in each line of the code above. We're essentially defining where the session ID string starts and ends, and then capturing it and assigning that value to the str_sessionid variable.
The first line of code searches the preceding HTML (in the _response file) for the string "sessionId=". When it finds that string, it assigns our int_start variable a value that's the numeric value of the character position where the string "sessionId=" begins. Note that it identifies the position of the first s, not the space following the =. When doing this yourself, make sure the string you use to find the start character is unique in the HTML.
The second line of code searches the preceding HTML (again in the _response file) for the string "\" method=". When it finds the string, it assigns the int_end variable the value of the character position of the \. This is the character immediately following the last character of the session ID.
The third line is a little more complicated. The syntax for this line is
variable = substr(whole-string, starting-position, substring-length)
From left to right, the line says: Set the variable str_sessionid equal to a substring of the entire _response for this HTML page starting at the position represented by the value of int_start plus ten spaces (to move to the position immediately past the = in "sessionId=", which is where the session ID begins) and continuing on for the number of spaces represented by int_end minus (int_start + 10).
If this is new to you, please refer to any good introduction-to-C book. String manipulation isn't the most complicated programming in the world, but it's a bit tricky until you get used to it.
Finally, having captured the session ID, we need to substitute our variable into all the places in the script where the datapool value for sessionId is located. Listing 12 shows the modified code for the http_request referenced in Listing 8.
| |
| Listing 12: Modified script with session ID replaced by variable |
Now when the script plays back, it will always capture the current session ID assigned by the application. Here are some things to remember when using this method:
-
If you're using split scripts, remember to make
str_sessionida persistent variable that passes from script to script for that virtual user. -
Use the
printfcommand to write your variables to a file while testing your script for debugging purposes.
There are other functions that are useful for parsing strings. Refer to your favorite C book to find out about them.
I'm fairly certain that you're thinking one of two things right now:
"Cool! I wish someone had told me that before. That would have solved that problem I was having!"
or
"Wow, Scott, I've read ten of your articles and understood all of them, but this time you left me in the dust!"
If you're one of the people thinking this is cool, you already know where you're going to try out capturing dynamic data. If you're in the second group, keep reading.
We're going back to our favorite example site, www.noblestar.com. Once again, you can pick any site you like, but if you do choose to record and play back against the Noblestar site, I request that you don't play back more than ten concurrent users.
First, launch Robot and go to Tools > Session Record Options. Ensure that on the Generator tab, "Use datapools" is checked and "Display recorded rows" is set to All. On the Generator per Protocol tab, ensure that "Correlate variables in response" is set to All. Then record a script that opens the Noblestar home page and navigates to at least one other page.
When you look at this script, you'll see that part of the request line includes:
"Cookie: NSES40Session=somebunchofrandomlookingcharactors\r\n"
So now we know that the Noblestar site uses cookies. You'll also notice that there are no SgenRes lines. In truth, Robot would handle this session ID with no modification on our part. But since I don't have access to a site that it wouldn't work with, we're going to pretend that it wouldn't work here and that we need to handle this session manually.
First, add the variable declarations at the top of the script as shown in Listing 9. Then copy the code from Listing 11 and paste it over the SgenRes block. Next you'll want to edit the code to look like this:
int_start = strstr(_response, "NSES40Session="); int_end = strstr(_response, ";path="); str_sessionid = substr(_response, int_start + 14, int_end - (int_start + 14)); |
As you can see, we edit the code to represent the new start and end positions based on what we can see in the following line in the first #if 0 block:
"Set-cookie: NSE40Session=morerandomlookingcharacters;path=/;"
Finally, we'll need to search and replace the current session ID with the variable. The new Cookie: line in the http_request blocks should look like this:
"Cookie: NSES40Session=" + str_sessionid + "\r\n"
It's just that simple.
You've learned how to handle authentication and session tracking in your test scripts. Robot's VuC recording and scripting mechanism handles authentication well if you make sure it's configured properly, based on whether the authentication is basic, secure, or automatic (using cookies). You can handle session tracking with client-side cookies or with session IDs by putting them into datapools, by correlating the variables, or by capturing dynamic session data. Knowing how to do the latter will enable you to conduct performance tests on applications that would otherwise be essentially impossible to test. Remember, if you're not a C programmer, you'll find a good C reference book to be invaluable as you learn how to do this.
- "The World Wide Web Security FAQ" by Lincoln D. Stein and John N. Stewart
- "Persistent Client State HTTP Cookies" (Netscape site)
Currently, Scott Barber serves as the lead Systems Test Engineer for AuthenTec. AuthenTec is the leading semiconductor provider of fingerprint sensors for PCs, wireless devices, PDAs, embedded access control devices and automotive markets. He is also member of the Technical Advisory Board for Stanley-Reid Consulting, Inc.
With a background in consulting, training, network architecture, systems design, database design and administration, programming, and management, Scott has become a recognized thought leader in the field of performance testing and analysis. Before joining AuthenTec, he was a software testing consultant, a company commander in the United States Army and a government contractor in the transportation industry.
Scott is a co-founder of WOPR (the Workshop on Performance and Reliability), a semi-annual gathering of performance testing experts from around the world, a member of the Context-Driven School of Software Testing and a signatory of the Agile Manifesto. He is a discussion facilitator for the Performance and VU Testing forum on Rational DeveloperWorks and a moderator for the performance testing and Rational TestStudio related forums on QAForums.com. Scott speaks regularly at a variety of venues about relevant and timely testing topics. Scott's Web site complements this series and contains much of the rest of his public work. You can address questions/comments to him on either forum or contact him directly via e-mail.
Comments (Undergoing maintenance)





