Skip to main content

skip to main content

developerWorks  >  Rational  >

Rapid test automation with Rational Robot: A case study

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Penny Witt, Consultant, Consultant

15 Mar 2003

from The Rational Edge: This case study documents a rapid method of script automation, using Rational Robot, that resulted in very efficient test script development and a 100 percent improvement in script modification time for testing a large application under test (AUT).

This case study documents a rapid method of script automation, using Rational® Robot, that resulted in very efficient test script development and a 100 percent improvement in script modification time for testing a large application under test (AUT). Generally, functional tests take hours or days to automate. Further, when two or three object IDs change, every automated script with those IDs has to be modified and/or the spreadsheets need to be adjusted. With this rapid approach to scripting, however, my team is able to develop automated tests in as little as fifteen or twenty minutes. And when an ID changes in the application, only one script must be changed to fix 500 to 1,000 tests. Not only that; the automation is self-documenting and does not require an expert programmer or additional training.

To develop and implement this method, the client — a Fortune 500 lumber company headquartered in the Pacific Northwest — hired one automation specialist from Odyssey Software and Consulting; that was me. Within two months, I was able to help the testers automate more than 100 manual base tests into a regression suite. By the eighth month, 300 tests were automated, and the company invested in five more Rational Robot licenses, for a total of eight machines that could run the functional tests in eight to twelve hours. At the end of the year, management was so impressed with the advantages of automated testing that they created a second environment just for general ledger automation, assigned a manual tester who knew general ledger testing to automate that area (beginning with our existing accounting test SHELLs), and started searching for a second full-time automation tester for the rest of the system.

Now that the project is nearing completion, the stakes for success and the potential for error are at their peak. All eighty sites in the US and Canada are up and running, which means that any problems in the software release can have a huge impact. The AUT is so complex and integrated that the test team is requiring that hot fix releases each have their own regression tests prior to release — and, because we are using this rapid automation method, we are able to do this without compromising schedule.

Background

The Application Under Test (AUT) comprised an enterprise wide system that encompassed everything from inventory and point-of-sale to general ledger processes. Designed for use by 1,500 people in more than eighty locations in the US and Canada, it was written in C++ with an SQL Server 2000 back-end. When I arrived at the company, development was in the third year of a four-year plan; the basic application was fairly stable, but a great many features remained to be developed.

Testing still consisted of manual tests by five testers. Although management wanted to automate the knowledge base that the testers had built, they had not come up with a viable method.

The testers had tried an automated "Record and Playback" method on basic functions, but they found that maintenance of the automated tests was too time consuming and advanced for them.

In addition, the company was very familiar with running automated scripts that drew information from data files. They were using this method full time to load data into the system with each new facility that went live. But it did not seem to transfer to functional testing, which had little to do with data and more to do with calculations, buttons, and varying paths through the AUT.

The company also tried a full-time automated test programmer but with little success; the programmer did not know the AUT well enough to facilitate the automation.

Therefore, the problem was how to reap the testers' knowledge without depending on them to maintain the automated tests. It was obvious that the learning curve was too great to expect the automated tester to produce the automated scripts in a vacuum. At the same time, each manual tester could not be expected to devote more than two or three hours of a forty-hour week to test automation. What the development organization needed was a methodology that would make automated coding fast, yet easy to understand by testers and programmers alike when problems occurred.



Back to top


Analysis

After reviewing some standard processes in the AUT, it appeared that all steps (click on a button, enter a field, select a process, etc.) were used consistently, but with variations in the order of use. This meant that each individual action could be programmed into individual scripts. With this method, we could assemble the scripts in the order required for each functional process tested.

These rapid scripts had to be so independent that they could not rely on previous or future scripts. In other words, they could not include validations that a certain state existed before the action or that the action completed correctly. Any such validations would vary, depending on the order in which the rapid scripts were used.

Eliminating validations represents a drastic change from normal automated programming. Generally, critical areas in a test are validated so that if the full test fails, there is an indication as to where the test might have gone wrong. The log at the end of the test shows each validation result and then an end result for the entire test. Without such intermediate validations, it is difficult to determine precisely what might have gone wrong — the log simply shows a "Fail" for the full test.

Most steps taken in a test are treated as part of the total code and are not displayed in the log. This makes debugging a script difficult; there may be a validation showing what area passed or failed, but the tester still has to read the test code to figure out what steps were taken. When a programmer is running fifty functional tests, it is nearly impossible to remember all the steps taken in each process. The rapid scripts, in contrast, are assembled and called from a master script. The log shows a pass/fail result for each step, and the script name describes the step.

Title bars, we found, change not only by area, but also within the area. This is critical to Windows automation, which bases its focus on the title bar of a window. As a general rule, one word in each area remains consistent on the title bar. Because global variables and libraries are hard to document and trace, we wanted to limit their use to as few as possible. Our goal was to use rapid scripts that would describe individual steps, so that anyone could figure out what the test did. Using libraries and variables would defeat our goal of readability. Instead, the rapid scripts would hard code the one consistent word in the title bar and use a wild card for the rest of the title. This decision dictated that all scripts would be separated by area.

Next, we needed a tracking method to identify each functional test as it was processed in the AUT. The point-of-sale system maintained a field in each transaction to track the number provided by the customer or supplier, as well as the AUT's own unique numbering system. This customer/supplier number field was perfect to use for tracking purposes: It didn't impact processing in the AUT, yet it was accessible throughout the system to provide user assistance.

Finally, since we were sharing the test environment with manual testers, we would need to select specific customers, suppliers, and item numbers to become the exclusive property of automated testing, and then set them up with certain properties and restrictions, depending on the functional tests.



Back to top


Design

The name we used to track the test in the AUT needed to identify the functional test, and also indicate possible test cycles. We decided to create a tracking name that started with the functional test ID. Then, instead of devising a complex, unique way to identify test cycles, we tacked on the hour, minute, and second the test was started. By looking in any transaction queue of the AUT, we could identify the correct test transaction by the functional test number, and also identify the cycle by the hour, minute, and second. See Identifying Tests and Cycles below.

Identifying Tests and Cycles

In the Supplier Reference field for purchasing, the tracking name for the first functional test was: "P001 - 12:22:04".

An initial naming convention was established to distinguish a functional test script from the hundreds of other small rapid scripts that would be developed. As we had already decided to identify all scripts by area, the first word or letter in a script name needed to recognize the area. For functional test scripts, we decided to use a letter to identify the area, leaving the most possible room for the function description. Then, after the letter we would number the functional test in sequence as they were developed. Following the functional test number would be a brief description of the test. And finally, we would include the word SHELL at the end to distinguish functional scripts from rapid scripts, and to indicate that they held numerous rapid scripts. Rapid script names would start with the name of the area, followed by a short description of the script's action. See the next section.

Functional Test Names and Rapid Script Names 
The first functional test in sales was 
"S001 - Simple Sales Order SHELL" 
The menu bar rapid script for File->Open in Sales was 
"Sales - Open Transaction". 

As noted above, each functional test was a "SHELL script" composed of many individual rapid scripts. Because the tracking name of the SHELL script had to be passed to many rapid scripts within the SHELL, it was put into a global variable at the start of the SHELL script. To insure that the system could determine where each functional test started, the screen would be cleared at the start of the SHELL script and at the end. See the section on SHELL Scripts Holding Rapid Scripts below.

SHELL Scripts Holding Rapid Scripts

''''''''''''''''''''''''''''''''''''''''''''''''''''' 
'''''S001 - Simple Sales Order SHELL''''' 
''''''''''''''''''''''''''''''''''''''''''''''''''''' 

'''''''''''GLOBAL VARIABLE''''''''''''''''''''''''''' 
ShellScript = "S001 - " + Str(Time$) 
''''''''''''''''''''''''''''''''''''''''''''''''''''' 

CallScript "Sales - File New" 
CallScript "Sales - Customer = Walker Lumber" 
CallScript "Sales - Transaction Type = Sales Order" 
CallScript "Sales - Customer PO" 
CallScript "Sales - Tools-Goto Lines" 
CallScript "Sales - 12031321020 Lumber Item" 
CallScript "Sales - Release Shipment" 
CallScript "Sales - Retrieve Order" 
CallScript "Sales - Verify Order Released" 
CallScript "Sales - File New" 

Estimating that there would be hundreds of tests in each area, we knew we would need to run numerous machines at the same time. We also knew there was potential for unrealistic record locks when certain areas crossed over into others during a function. To avoid this problem, we assigned each area its own set of customers and suppliers. Each machine ran a single area using a master script, as shown in the section below.

Master SHELL Script

'''''''''''''''''''''''''''''''''''''''''' 
'''''MASTER SALES SHELL''''' 
'''''''''''''''''''''''''''''''''''''''''' 
CallScript "S001 - Simple Sales Order SHELL" 
CallScript "S002 - Simple Sales Quote SHELL" 
CallScript "S003 - Simple Sales Adjustment SHELL" 
CallScript "S004 - Simple Credit Memo SHELL" 
CallScript "S005 - Simple Material Movement SHELL" 



Back to top


Development

It took a month to produce the first thirty functional tests, each of which contained ten to thirty rapid scripts. By the second month we had added fifty more tests. As we created more functional tests we had to create fewer and fewer rapid scripts, because so many already existed. In eight months we had about 300 functional tests running off around 450 rapid scripts.

Generally, we could create three functional tests an hour. The technique was to copy a SHELL script, which started out similar to the new function, then change or add rapid scripts where the new function varied.

The build cycle (small bug fixes to the AUT) did not seem to have much effect on the regression suite that was developing from the test SHELLs. But new features had significant impact on the number of broken scripts. Every area the feature touched had new object IDs and required changes to rapid scripts. Here, too, the benefit of these rapid scripts was obvious: When we fixed the ID in one rapid script, it was automatically fixed in every SHELL script that used the same rapid script. Therefore, even though the SHELL scripts were increasing ten fold, we did not have a ten-fold increase in IDs to correct.

The scripts were a great success with the testers. They would work with the programmer for about two hours a week to put together new SHELL scripts. These scripts covered redundant testing that testers had been doing to test each new build. As they had to do less and less redundant testing, they became increasingly more supportive of automated testing. The scripts were fairly self-explanatory — one rapid script for each step. When the regression test was run, the programmer would first analyze any failures to determine if it was a script failure (i.e., ID wrong, new fields, etc.). If the failure did not appear to be caused by a problem with the automated script, the tester would take responsibility for analyzing the AUT problem.

Naming Conventions

It became clear within a short period of time that the naming convention needed to be more controlled. Initially, the rapid scripts were named what the testers called the action. But after a while, it was apparent that those names were not always the standard terminology for the actions and often each tester would call the same action by a different name. Plus, the rapid script name did not seem to relate to any function keys displayed on the screen, which made it hard for both the programmer and other testers to be sure what the rapid script was doing. When testers specified "Book the Transaction" in a script, it was the end result of steps that started by selecting Tools->Record Receiving Results from the menu bar, and then selecting Tools->Done Receiving. Further, some actions stated the name correctly, but gave no indication as to how you would achieve that action manually. For example, if the action specified "Releasing Transaction," it could have meant either selecting Shipment->Release from the menu bar, or clicking on the Release icon on the toolbar. The rapid script for opening a search panel was named for the type of item you wanted to search for, but the name did not indicate if it was clicking on a button, going to the menu bar, or right clicking on the mouse. Therefore, if the functional test failed, it was hard to know exactly what steps to take to reproduce this failure manually.

Probably most confusing was looking for the right rapid script in a list of 400 such scripts. Sometimes they would be listed under a colloquial name and other times by screen function name. We imposed stricter rules to name rapid scripts by specifying the area first, then the screen action taken, and, then any data name used in the action. In this way, areas would alphabetically fall together, all the push-button actions would be listed together, and all tab actions would be listed together. When it was a field entry, the label beside the field was put in after the area, followed by an "=" sign and the field data entered. Finally, menu-bar selections listed the area, the name on the menu bar, a hyphen, and then the name on the drop-down menu. See the Naming Convention example below.

Naming Convention

Sales - Customer = Alen Afery 
Sales - Customer = Wayne World 
Sales - BUTTON Continue 
Sales - BUTTON Go 
Sales - File-New 
Sales - File-Open 
Sales - TAB Additional 
Sales - TAB Delivery 

Establishing Consistency

A problem that took a while to become apparent was that we were not consistently implementing our rapid approach across the board. At first, we put actions that seemed to run together into one script. The script "Purchasing - Tools-Record Received" was a menu bar selection that also included the next menu bar selection, "Shipment->Done Received" since they both were always done in conjunction. But as functional tests became more complex, we found that there were too many other things that could be done between selecting "Tools->Record Received" and selecting "Shipment->Done Received." So we had to go back and separate the original rapid script into two separate scripts in all the SHELL scripts.

Hot Keys were not initially utilized as much as they should have been. Their use eliminated changing Object IDs in areas where the code was re-worked, so we began using them more instead of clicking on Objects. The switch did not seem to increase or decrease the amount of AUT failures we discovered in automated testing.

Further, as we got used to the changing Object IDs in new releases, we found that some of those objects could be isolated into one script and taken out of multiple scripts in which the other objects did not change. One example was a double-click on an object that brought up a detailed screen to do various functions. Since the double-clicked object ID always seemed to change, and the functions on the detailed screens rarely required ID changes, the double-click was put into a separate rapid script.

This breaking up of rapid scripts into smaller scripts was a massive undertaking. It was anybody's guess what SHELL scripts contained which rapid scripts. A program was developed to go through each SHELL, check for the old rapid script, and input it into the new rapid script. But in some cases, that required multiple changes that could not be easily programmed, so each SHELL had to be reviewed.

Verification of the functional test was another revised area. SHELL scripts usually ended the list of rapid scripts with one rapid script that verified the result of the functional test. After one particularly large release requiring major ID modifications, we noted that most verification scripts were used by only one SHELL script. Since the idea behind rapid scripts was to re-use them over and over in many scripts, verification scripts did not meet the criteria. In fact, they actually made the correction of object IDs harder: The user was required to go to a second script for every SHELL script verification. Therefore, verification of the functional SHELL script was coded into the end of the SHELL and deleted as a rapid script. This meant the SHELL script was not as clean and easy for a layperson to read.

As the programmer became more familiar with the application, she discovered that many of the exact same dialogue boxes popped up in various areas of the AUT. These dialogue boxes had their own title bar, which did not change regardless of the area from which it was accessed. Since all rapid scripts were named by area, this meant that these dialogues were being programmed over and over again — at least one rapid script for each area. So we developed a new area and named it "U" for universal rapid scripts. So the script names in this area started with U, followed by the title bar name of the dialogue box. This cut down on the number of redundant scripts and is continuing to do so.

Multiple Environments and Releases

As the weekly regression test run began to prove itself valuable, management wanted to increase leverage for the automation by testing hot fixes that came out in between releases of the AUT. This was a big hurdle to overcome on two fronts.

First, the environment to test the hot fixes needed to be a separate database reflecting what was being used in the field. The future release we tested every week had a database with future changes; the release being used in the field did not have this database.

Second, new builds were being put into the future release daily. This new code introduced Object ID changes that would not be in the hot fix code. Therefore, some Object IDs in rapid scripts might vary, depending on whether they were running on the future code under test or the hot fix code.

Until this point, we had been able to make only one global variable: the tracking name of each test that was passed from rapid script to rapid script within a SHELL script. Now, it looked as if we needed at least one more global variable to indicate what release the scripts were to test.

Expanding Automation

Automated testing had only one programmer, who worked with the testers developing new tests three days a week, and then ran regression tests the last two days of the week. The thought was that the new build would first be checked by the automated regression test to see if any of the new code broke existing code in the AUT. Then, the testers could start their week knowing that basic code was unchanged; or, in the worst-case scenario, they would know what didn't need to be tested, because it had to be re-worked.

By this time, the regression test consisted of more than 300 SHELL tests running on eight machines. The two days of regression testing now took place within a fairly consistent format. The first eight to twelve hours consisted of keeping all the machines running. There were still crashes, especially in areas where the most features were being developed. The programmer would simply note the failed script and re-start the Master SHELL on the next script. When all scripts were done, it was a matter of going back to the list of failed scripts and re-running them to determine the cause: Was it an automated script failure or an AUT failure? We developed a spreadsheet listing all the automated SHELL scripts and sent it out, noting the Pass/Fail results of the regression test. If the SHELL was marked "Fail," that indicated that there might be an AUT failure, and the testers could research the problem. Sometimes research meant watching the SHELL run, or working manually referencing the list of rapid scripts in the SHELL.

The introduction of a second environment and the need to maintain alternating code required another person, but there was much trial and error to determine what skills that person needed. This form of programming was generally too extreme for anyone with previous programming experience. Not because it was too complex, though — in fact, for just the opposite reason.

Most programmers use coding devices such as code libraries, global variables, arrays, loops, go to and function statements to create an application. But our approach to automated programming discourages this type of coding in lieu of descriptions of clear-cut actions that were completed to achieve the test. Unlike normal programming code, which is not readily understood by non-programmers, and requires a great deal of documentation for maintenance, these descriptions are written in plain language as the script name. Obviously, a programmer, or even an aspiring programmer, would not be long for this job.

The ideal candidate was a tester willing to leverage his or her testing experience work with our basic programming technique. Our initial need was to run regression tests and learn to analyze them. We knew the analysis would take twice as long now that we were running in two environments, and testing both future code and hot fixes. The important part of analysis would be understanding what the functional test was supposed to do. Every time a script failed, we would have to determine whether the problem was the AUT or whether the script did not adapt to the release. A future release might very well have a new feature that the hot fix did not have. When the automated code was looking for a different object ID or data, an IF statement would be needed to determine if the script should read for future code/data or for hot fix code/data.

Much Accomplished, More to Do

We have been using this rapid test automation approach for over a year and a half; currently, we have more than 500 SHELL scripts and 1,200 rapid scripts. We have covered most areas of the application at a high level, but we must do a great deal more to complete those areas.

Regression test time has not increased significantly with this growth in the number of tests. More and more of the test areas are becoming stable, with few if any new changes. Also, as the AUT improved, the accessing of the Customer PO or Supplier Reference Number became more prevalent throughout the system. This is the same field used by our SHELL scripts to identify the test, so our scripts were able to access orders for validations more efficiently. AUT improvements have eliminated much of the previous test set-up and test name searching.

Scripting might have gone more smoothly if all team members had more strictly observed the standards for the selectors and hot keys. As with most projects under deadline, team members often treated these standards as enhancements, and generally gave them a back seat. Once management started seeing the benefits of test automation, however, we began enforcing the standards more rigorously, because some areas could not be scripted without them.

As time has gone by, the testers have begun to rely more on automation to do initial runs in the areas they test. Conversely, they seem to have even less time to develop new automated scripts. As a half measure, we require manual test documentation on each new bug fix the testers finish testing. This new test documentation is then used to create new SHELL scripts from, or to add to, existing SHELL scripts.

In addition, management was so sold on the advantages of test automation that they created a closed environment devoted strictly to an automated balancing of the general ledger account. This ensured data integrity and allowed for resulting data to be cleared for each new test cycle. A manual tester who had been doing general ledger testing was able to build on initial SHELL scripts in the accounting areas to create a full-blown accounting regression test.

As the AUT nears completion, interest has focused more and more on system performance and using automated scripts. Engineers can now pinpoint areas and actions that have performance problems, and the automation tester can pull similar SHELL scripts and adapt them to run so that system engineers can gather and analyze performance statistics.

As noted earlier, we have completed installation of the AUT at all eighty sites and are now in the final stages of fixes and features. For everyone involved in the project, the stakes are now higher than ever: For the client's management, a flawed release can cause widespread problems; for my employer, problems could negatively impact the client's final impression of our company; and the test team is concerned because the AUT is so complex and integrated that one small change can affect innumerable areas.

To reduce all of these risks, the test team is now requiring regression tests prior to release of hot fixes as well as the regular releases. They know there is no way their team can manually test all the areas that are covered in the regression test and still remain on schedule. For hot fixes the time is even more critical, and a full regression test is not possible or practical. So instead, when a hot fix is built for release, the team determines the areas the fix impacts and then selects the corresponding automated SHELL scripts to run.

Since we hired another full-time automation tester, we are able to manage unscheduled hot fix testing (which takes up to a day or two — in addition to maintaining two environments) as well as regular regression tests and the scripting of new SHELL scripts. Over the last three weeks we have been able to do regression tests and a hot fix, and still script eighty new SHELL scripts.



About the author

Penny Witt has worked as a consultant for over fifteen years, devoting the past eight to using automated test tools. Her preference for Rational Robot began in 1994, when the US Customs Department requested an evaluation of the top four test tools. She has not changed her preference since that time.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top