Introduction to profile-directed selection
Regression testing is often necessary to ensure software quality. However, it is widely considered an expensive process. To reduce this cost, test selection techniques are applied to the full test suite.
The result of a test selection technique is a reduced test suite. Several test selection techniques exist that attempt to reduce the test suite size while maintaining a high defect detection rate.
This article presents a safe and efficient test selection technique that relies on profile data to select test cases. We call this technique profile-directed selection (PDS).
PDS operates in two phases:
- Populating the database with profile data
- This first phase relates the application source code with the test cases. It executes the full test suite on a code coverage or profile-instrumented application. The resulting profile data catalogs the source files, function names, and line statements called by each test case. Each test case is then associated with the files and functions of the application that it encounters. This relationship is stored in a relational database. The code coverage data can be applied with finer granularity, but that can come at a cost: more data means more processing time and maintenance.
- Selecting the test cases based on the change set
- The second phase, test selection, constructs and executes a query based on the source code modifications made to the application. The query selects the test cases, which traverse a function modified by the code change.
Figure 1. Test selection in regression testing
Profiler instrumentation
PDS relies on profile data to relate test cases with an application's source code. Therefore, the application must be instrumented to generate profile data. When selecting a profiler, consider the following:
- Relative performance impact
- Output usability for tooling
- Storage requirements
In our experiments, we chose to use function hooks through the functrace feature of the IBM XL C/C++ and Fortran compilers. We used this compiler feature to place a func_trace_enter and func_trace_exit hook on every function in our sample application. These functions generate the profile data.
PDS uses a relational database to relate each function and source file with each test case. We used IBM DB2® for our experiments.
PDS is a function-level granular selection technique. The code coverage granularity can be as fine or as coarse as necessary. We chose function-level granularity because it represents a median between selection technique processing costs and test suite size reduction. Finer-grained test selection techniques, such as control-flow graph-based techniques, might select fewer test cases, but they come at a cost of more processing time and storage requirements. For applications consisting of hundreds of thousands of test cases, the processing cost can exceed the benefit of the technique, therefore granularity should be considered.
Function change set detection
Modifications made to the source files between S and S' are called the change set. Extracting the modified functions and modified source files in the change set was a manual step in our experiments.
Most integrated development environment (IDE) applications can detect the changes made between S and S' automatically.
Modeling selection technique effectiveness
The effectiveness of profile-directed selection was based on defect detection effectiveness and average test suite reduction rate.
Profile-directed selection technique, or PDS
The PDS technique aims to select test cases that were affected by the change set between S and S'.
As mentioned previously, PDS consists of two phases:
- Populating the database with profile data
- Selecting the test cases based on the change set
Figure 2. Profile-directed selection overview
Algorithm
PDS can be expressed in the following pseudo-code:
1. Test S with T to generate P={P1,…,Pn}, the profile data of S.
2. For each test case Ti for i=1 to n, relate Pi to Ti in a relational database.
3. Let P' be the source code change set between S and S'.
4. For i=1 to n, if |P'∩Pi|>0 then return Ti. |
Phase 1. Populating the test case database
Phase 1 of PDS populates the database by relating the test cases to the source code.
A prerequisite for Phase 1 is that S must be instrumented with a profiler. In our case, we used the functrace feature in the IBM XL C/C++ and Fortran compiler to insert func_trace_enter and func_trace_exit hooks on each function.
Figure 3. Build a profile-enabled application with functrace
Our hook definitions generated profile data that included the source file paths and functions that were exercised. To reduce the total storage requirements, we used a Set data structure to store unique function names and source paths that were exercised. We also retained the call frequency of each function. The resulting profile data has the following format for each test case:
Source:Function:CallFrequency main.c:main():1 main.c:foo():2 boo.c:goo():1 |
The first column represents the source file, the second column represents the function name, and the last column represents the call frequency.
With this design, we ran the entire test suite T on S. This generated profile data P for each test case.
For each test case Ti for i=1 to n, we related the profile data Pi to Ti in a relational database.
This is the database schema:
Table: TestCaseToSourceCode
Test Case Path, Function Name, Source Path |
In our implementation, as code Listing 1 shows, we normalized this table and created three tables for the test case path, source file, and function name.
Listing 1. Implementation example using functrace
#include <iostream>
#include <sstream>
#include <fstream>
#include <map>
#include <demangle.h>
static bool exitedMain = false;
struct ProfileData
{
std::string sourceFile;
std::string functionName;
};
struct Comparator
{
bool operator()(ProfileData s1, ProfileData s2) const
{
return strcmp((s1.sourceFile+ s1.functionName).c_str(),
(s2.sourceFile + s2.functionName).c_str()) < 0;
}
};
static std::map<ProfileData, int, Comparator>* tccMap = NULL;
static std::string getDemangledName(std::string mangledFunctionName)
{
char *rest;
Name* name;
if ((name = Demangle(const_cast<char*>(mangledFunctionName.c_str()), rest)) != NULL)
return ((FunctionName *)name)->Text();
else
return mangledFunctionName;
}
static void printData(const ProfileData &data, int callFrequency)
{
std::string functionName = getDemangledName(data.functionName);
std::ofstream outputFile("func_trace_profile.dat", std::ios::out | std::ios::app);
outputFile << data.sourceFile << "|"
<< functionName << "|" << callFrequency << std::endl;
outputFile.close();
}
std::string basename(std::string path)
{
std::string::size_type size = path.find_last_of ('/');
if ( size == std::string::npos )
return path;
return path.substr(size + 1);
}
extern "C"
void __func_trace_enter(const char *functionName,
const char *fileName, int lineNumber, void** const id)
{
if (tccMap == NULL)
tccMap = new std::map<ProfileData, int, Comparator>;
ProfileData data;
data.sourceFile = basename(fileName);
data.functionName = functionName;
if (tccMap->find(data) == tccMap->end())
{
if (exitedMain)
printData(data, 1);
(*tccMap)[data] = 1;
}
else
{
(*tccMap)[data]++;
}
}
extern "C"
void __func_trace_exit(const char *functionName,
const char*fileName, int lineNumber, void** const id)
{
if (strcmp(functionName, "main") == 0)
{
std::map<ProfileData, int, Comparator>::iterator myIterator;
for(myIterator = tccMap->begin(); myIterator != tccMap->end(); myIterator++)
{
int callFrequency = myIterator->second;
printData(myIterator->first, callFrequency);
}
exitedMain = true;
}
} |
We omitted the call frequency from the profile data in our experiments, but planned to use it for ranking purposes in the future.
When the modified software S'became available, we proceeded to Phase 2.
In Phase 2, we manually detected the functions modified, added, and deleted between S and S'. In the case of deleted and added functions, we identified the parent function. From this change set, we constructed a query from the relational database, which produced a reduced test suite T'.
After we established a mechanism to select test cases based on the software change set, we conducted an experiment to analyze the effectiveness of the PDS technique.
Our experiment compared three test selection techniques (shotgun or random, manual selection, and PDS) with a complete test suite. We tested the suite using the IBM XL C/C++ Version 11.1 compilers for AIX®.
We selected two internal test suites. One had 832 test cases, and other had 781.
Using the two internal test suites, there were a total of 16 defects found between the two builds of the IBM XL C/C++ V11.1 compiler that we chose to test. It took approximately 900 seconds to run each test suite.
The random and manual selection techniques were very inexpensive, taking less than 50% of the time of a full test suite run. However, they yielded much less effective defect detection rates, finding only 37.5% (6 out of 16) and 43.75% (7 out of 16) of the total number of defects.
The PDS technique also reduced the processing time by approximately 50%, on average, between the two test suites. It was significantly more effective in detecting defects, yielding a 100% (16 out of 16) detection rate over the random and manual selection run.
Therefore, in determining the cost-effectiveness of each technique, we looked at the number of defects found over the execution time of each test suite. The PDS technique yielded a result of 0.3 defects per second, but he full, random (shotgun), and selective techniques (manual selection) all had results of less than 0.2 defects/second.
Figure 4. Raw empirical data, by technique
Figure 5. Effectiveness of selection techniques
In the future, we will study the impact of various granularities. Specifically, we aim to compare the processing and maintenance cost of populating the test case to the source code relationship. Additionally, we plan to find the granularity that will provide the lowest processing cost while maintaining a high test suite reduction rate.
This technique could be applied to sanity-test execution runs, where a fixed amount of test cases is required. The problem that needs investigation is which test cases to choose and how to rank these test cases. One possibility might be to combine this technique with prioritization techniques, one of which could be to use the call frequency as a rank.
Tool integration for automatic code verification
Ultimately, we would like to see this algorithm implemented into existing IBM applications, such as Rational Application Developer, Rational Developer for System z, and Rational Developer for Power Systems Software, and Rational Team Concert™. Ideally, when a developer has submitted changes, a list of test cases that are affected by the given change set should be selected and executed.
Test cases should execute on-demand, based on source code changes. This enables developers to automatically verify their code changes, thus avoiding the need for full regression runs.
We invite you to provide suggestions, feedback, or other comments regarding the PDS technique and its potential use in the testing field.
The authors thank the following individuals who made this article possible: Preston Koo, Roch Archambault, and Kobi Vinayagamoorthy.
| Name | Size | Download method |
|---|---|---|
| func_trace_profile.dat | 241KB | HTTP |
| functrace.C.zip | 3KB | HTTP |
Information about download methods
Learn
- References related to this article:
- G. Rothermel and M.J. Harrold. A Safe, Efficient Regression Test Selection Technique. In ACM Trans. Software Eng. and Methodology, vol. 6, no. 2, pp. 173-210, Apr. 1997.
- Todd L. Graves, Mary Jean Harrold, Jung-Min Kim, Adam Porter, Gregg Rothermel. An empirical study of regression test selection techniques. In Proceedings of the 20th international conference on Software engineering, p.188-197, April 19-25, 1998, Kyoto, Japan.
- IBM XL C/C++ for AIX, V11.1, Compiler Reference, 2009.
- For more information about the compilers:
- IBM XL C/C++ compilers for AIX, Linux, IBM z/OS®, IBM z/VM®, plus the XS C/C++ Advanced Edition for Blue Gene® and the Development Studio for IBM System i®.
- IBM Fortran compilers: XL Fortran for AIX, XL Fortran for Linux, XL Fortran Advanced Edition for the Blue Gene, VS Fortran
- Visit the Rational software area on developerWorks for technical resources and best practices for Rational Software Delivery Platform products.
- Subscribe to the developerWorks weekly email newsletter, and choose the topics to follow.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
- Watch developerWorks on-demand demos, ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
Get products and technologies
- Download and evaluate XL C/C++ for AIX, which offers advanced compilation and optimization technologies designed for AIX and Power Systems.
- Download a free trial version of Rational software.
- Evaluate other IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
Discuss
- Get connected with your peers, ask and answer questions in the Rational Cafes, where you'll find forums for C/C++, COBOL, EGL, Fortran, HATS, PL/I, and more.
- Join the Rational software forums to ask questions and participate in discussions.
- Ask and answer questions and increase your expertise when you get involved in the Rational forums, cafés, and wikis.
- Join the Rational community to share your Rational software expertise and get connected with your peers.
- Rate or review Rational software. It's quick and easy.







