Level: Introductory Penny Witt, Consultant, Consultant
15 Mar 2003 from The Rational Edge: This case study documents a rapid method of script automation, using Rational Robot, that resulted in very efficient test script development and a 100 percent improvement in script modification time for testing a large application under test (AUT).
This case study documents a rapid method of script automation, using Rational®
Robot, that resulted in very efficient test script development and a 100 percent
improvement in script modification time for testing a large application under
test (AUT). Generally, functional tests take hours or days to automate. Further,
when two or three object IDs change, every automated script with those IDs has
to be modified and/or the spreadsheets need to be adjusted. With this rapid approach
to scripting, however, my team is able to develop automated tests in as little
as fifteen or twenty minutes. And when an ID changes in the application, only
one script must be changed to fix 500 to 1,000 tests. Not only that; the automation
is self-documenting and does not require an expert programmer or additional training.
To develop and implement this method, the client a Fortune 500 lumber company
headquartered in the Pacific Northwest hired one automation specialist from
Odyssey Software and Consulting; that was me. Within two months, I was able
to help the testers automate more than 100 manual base tests into a regression
suite. By the eighth month, 300 tests were automated, and the company invested
in five more Rational Robot licenses, for a total of eight machines that could
run the functional tests in eight to twelve hours. At the end of the year, management
was so impressed with the advantages of automated testing that they created
a second environment just for general ledger automation, assigned a manual tester
who knew general ledger testing to automate that area (beginning with our existing
accounting test SHELLs), and started searching for a second full-time automation
tester for the rest of the system.
Now that the project is nearing completion, the stakes for success and the
potential for error are at their peak. All eighty sites in the US and Canada
are up and running, which means that any problems in the software release can
have a huge impact. The AUT is so complex and integrated that the test team
is requiring that hot fix releases each have their own regression tests prior
to release and, because we are using this rapid automation method, we are
able to do this without compromising schedule.
Background
The Application Under Test (AUT) comprised an enterprise wide
system that encompassed everything from inventory and point-of-sale
to general ledger processes. Designed for use by 1,500 people in
more than eighty locations in the US and Canada, it was written in
C++ with an SQL Server 2000 back-end. When I arrived at the company,
development was in the third year of a four-year plan; the basic
application was fairly stable, but a great many features remained to
be developed.
Testing still consisted of manual tests by five testers. Although
management wanted to automate the knowledge base that the testers
had built, they had not come up with a viable method.
The testers had tried an automated "Record and Playback" method
on basic functions, but they found that maintenance of the automated
tests was too time consuming and advanced for them.
In addition, the company was very familiar with running automated
scripts that drew information from data files. They were using this
method full time to load data into the system with each new facility
that went live. But it did not seem to transfer to functional
testing, which had little to do with data and more to do with
calculations, buttons, and varying paths through the AUT.
The company also tried a full-time automated test programmer but
with little success; the programmer did not know the AUT well enough
to facilitate the automation.
Therefore, the problem was how to reap the testers' knowledge
without depending on them to maintain the automated tests. It was
obvious that the learning curve was too great to expect the
automated tester to produce the automated scripts in a vacuum. At
the same time, each manual tester could not be expected to devote
more than two or three hours of a forty-hour week to test
automation. What the development organization needed was a
methodology that would make automated coding fast, yet easy to
understand by testers and programmers alike when problems occurred.
Analysis
After reviewing some standard processes in the AUT, it appeared
that all steps (click on a button, enter a field, select a process,
etc.) were used consistently, but with variations in the order of
use. This meant that each individual action could be programmed into
individual scripts. With this method, we could assemble the scripts
in the order required for each functional process tested.
These rapid scripts had to be so independent that they could not
rely on previous or future scripts. In other words, they could not
include validations that a certain state existed before the action
or that the action completed correctly. Any such validations would
vary, depending on the order in which the rapid scripts were used.
Eliminating validations represents a drastic change from normal
automated programming. Generally, critical areas in a test are
validated so that if the full test fails, there is an indication as
to where the test might have gone wrong. The log at the end of the
test shows each validation result and then an end result for the
entire test. Without such intermediate validations, it is difficult
to determine precisely what might have gone wrong the log simply
shows a "Fail" for the full test.
Most steps taken in a test are treated as part of the total code
and are not displayed in the log. This makes debugging a script
difficult; there may be a validation showing what area passed or
failed, but the tester still has to read the test code to figure out
what steps were taken. When a programmer is running fifty functional
tests, it is nearly impossible to remember all the steps taken in
each process. The rapid scripts, in contrast, are assembled and
called from a master script. The log shows a pass/fail result for
each step, and the script name describes the step.
Title bars, we found, change not only by area, but also within
the area. This is critical to Windows automation, which bases its
focus on the title bar of a window. As a general rule, one word in
each area remains consistent on the title bar. Because global
variables and libraries are hard to document and trace, we wanted to
limit their use to as few as possible. Our goal was to use rapid
scripts that would describe individual steps, so that anyone could
figure out what the test did. Using libraries and variables would
defeat our goal of readability. Instead, the rapid scripts would
hard code the one consistent word in the title bar and use a wild
card for the rest of the title. This decision dictated that all
scripts would be separated by area.
Next, we needed a tracking method to identify each functional
test as it was processed in the AUT. The point-of-sale system
maintained a field in each transaction to track the number provided
by the customer or supplier, as well as the AUT's own unique
numbering system. This customer/supplier number field was perfect to
use for tracking purposes: It didn't impact processing in the AUT,
yet it was accessible throughout the system to provide user
assistance.
Finally, since we were sharing the test environment with manual
testers, we would need to select specific customers, suppliers, and
item numbers to become the exclusive property of automated testing,
and then set them up with certain properties and restrictions,
depending on the functional tests.
Design
The name we used to track the test in the AUT needed to identify
the functional test, and also indicate possible test cycles. We
decided to create a tracking name that started with the functional
test ID. Then, instead of devising a complex, unique way to identify
test cycles, we tacked on the hour, minute, and second the test was
started. By looking in any transaction queue of the AUT, we could
identify the correct test transaction by the functional test number,
and also identify the cycle by the hour, minute, and second. See
Identifying Tests and Cycles below.
Identifying Tests and Cycles
In the Supplier Reference field for purchasing, the tracking name
for the first functional test was: "P001 - 12:22:04".
An initial naming convention was established to distinguish a
functional test script from the hundreds of other small rapid
scripts that would be developed. As we had already decided to
identify all scripts by area, the first word or letter in a script
name needed to recognize the area. For functional test scripts, we
decided to use a letter to identify the area, leaving the most
possible room for the function description. Then, after the letter
we would number the functional test in sequence as they were
developed. Following the functional test number would be a brief
description of the test. And finally, we would include the word
SHELL at the end to distinguish functional scripts from rapid
scripts, and to indicate that they held numerous rapid scripts.
Rapid script names would start with the name of the area, followed
by a short description of the script's action. See the next section.
Functional Test Names and Rapid Script Names
The first functional test in sales was
"S001 - Simple Sales Order SHELL"
The menu bar rapid script for File->Open in Sales was
"Sales - Open Transaction".
|
As noted above, each functional test was a "SHELL script"
composed of many individual rapid scripts. Because the tracking name
of the SHELL script had to be passed to many rapid scripts within
the SHELL, it was put into a global variable at the start of the
SHELL script. To insure that the system could determine where each
functional test started, the screen would be cleared at the start of
the SHELL script and at the end. See the section on SHELL Scripts
Holding Rapid Scripts below.
SHELL Scripts Holding Rapid Scripts
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''S001 - Simple Sales Order SHELL'''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''GLOBAL VARIABLE'''''''''''''''''''''''''''
ShellScript = "S001 - " + Str(Time$)
'''''''''''''''''''''''''''''''''''''''''''''''''''''
CallScript "Sales - File New"
CallScript "Sales - Customer = Walker Lumber"
CallScript "Sales - Transaction Type = Sales Order"
CallScript "Sales - Customer PO"
CallScript "Sales - Tools-Goto Lines"
CallScript "Sales - 12031321020 Lumber Item"
CallScript "Sales - Release Shipment"
CallScript "Sales - Retrieve Order"
CallScript "Sales - Verify Order Released"
CallScript "Sales - File New"
|
Estimating that there would be hundreds of tests in each area, we
knew we would need to run numerous machines at the same time. We
also knew there was potential for unrealistic record locks when
certain areas crossed over into others during a function. To avoid
this problem, we assigned each area its own set of customers and
suppliers. Each machine ran a single area using a master script, as
shown in the section below.
Master SHELL Script
''''''''''''''''''''''''''''''''''''''''''
'''''MASTER SALES SHELL'''''
''''''''''''''''''''''''''''''''''''''''''
CallScript "S001 - Simple Sales Order SHELL"
CallScript "S002 - Simple Sales Quote SHELL"
CallScript "S003 - Simple Sales Adjustment SHELL"
CallScript "S004 - Simple Credit Memo SHELL"
CallScript "S005 - Simple Material Movement SHELL"
|
Development
It took a month to produce the first thirty functional tests,
each of which contained ten to thirty rapid scripts. By the second
month we had added fifty more tests. As we created more functional
tests we had to create fewer and fewer rapid scripts, because so
many already existed. In eight months we had about 300 functional
tests running off around 450 rapid scripts.
Generally, we could create three functional tests an hour. The
technique was to copy a SHELL script, which started out similar to
the new function, then change or add rapid scripts where the new
function varied.
The build cycle (small bug fixes to the AUT) did not seem to have
much effect on the regression suite that was developing from the
test SHELLs. But new features had significant impact on the number
of broken scripts. Every area the feature touched had new object IDs
and required changes to rapid scripts. Here, too, the benefit of
these rapid scripts was obvious: When we fixed the ID in one rapid
script, it was automatically fixed in every SHELL script that used
the same rapid script. Therefore, even though the SHELL scripts were
increasing ten fold, we did not have a ten-fold increase in IDs to
correct.
The scripts were a great success with the testers. They would
work with the programmer for about two hours a week to put together
new SHELL scripts. These scripts covered redundant testing that
testers had been doing to test each new build. As they had to do
less and less redundant testing, they became increasingly more
supportive of automated testing. The scripts were fairly
self-explanatory one rapid script for each step. When the
regression test was run, the programmer would first analyze any
failures to determine if it was a script failure (i.e., ID wrong,
new fields, etc.). If the failure did not appear to be caused by a
problem with the automated script, the tester would take
responsibility for analyzing the AUT problem.
Naming Conventions
It became clear within a short period of time that the naming
convention needed to be more controlled. Initially, the rapid
scripts were named what the testers called the action. But after a
while, it was apparent that those names were not always the standard
terminology for the actions and often each tester would call the
same action by a different name. Plus, the rapid script name did not
seem to relate to any function keys displayed on the screen, which
made it hard for both the programmer and other testers to be sure
what the rapid script was doing. When testers specified "Book the
Transaction" in a script, it was the end result of steps that
started by selecting Tools->Record Receiving Results from
the menu bar, and then selecting Tools->Done Receiving.
Further, some actions stated the name correctly, but gave no
indication as to how you would achieve that action manually. For
example, if the action specified "Releasing Transaction," it could
have meant either selecting Shipment->Release from the
menu bar, or clicking on the Release icon on the toolbar. The rapid
script for opening a search panel was named for the type of item you
wanted to search for, but the name did not indicate if it was
clicking on a button, going to the menu bar, or right clicking on
the mouse. Therefore, if the functional test failed, it was hard to
know exactly what steps to take to reproduce this failure manually.
Probably most confusing was looking for the right rapid script in
a list of 400 such scripts. Sometimes they would be listed under a
colloquial name and other times by screen function name. We imposed
stricter rules to name rapid scripts by specifying the area first,
then the screen action taken, and, then any data name used in the
action. In this way, areas would alphabetically fall together, all
the push-button actions would be listed together, and all tab
actions would be listed together. When it was a field entry, the
label beside the field was put in after the area, followed by an "="
sign and the field data entered. Finally, menu-bar selections listed
the area, the name on the menu bar, a hyphen, and then the name on
the drop-down menu. See the Naming Convention example below.
Naming Convention
Sales - Customer = Alen Afery
Sales - Customer = Wayne World
Sales - BUTTON Continue
Sales - BUTTON Go
Sales - File-New
Sales - File-Open
Sales - TAB Additional
Sales - TAB Delivery
|
Establishing Consistency
A problem that took a while to become apparent was that we were
not consistently implementing our rapid approach across the board.
At first, we put actions that seemed to run together into one
script. The script "Purchasing - Tools-Record Received" was a menu
bar selection that also included the next menu bar selection,
"Shipment->Done Received" since they both were always done in
conjunction. But as functional tests became more complex, we found
that there were too many other things that could be done between
selecting "Tools->Record Received" and selecting
"Shipment->Done Received." So we had to go back and separate the
original rapid script into two separate scripts in all the SHELL
scripts.
Hot Keys were not initially utilized as much as they should have
been. Their use eliminated changing Object IDs in areas where the
code was re-worked, so we began using them more instead of clicking
on Objects. The switch did not seem to increase or decrease the
amount of AUT failures we discovered in automated testing.
Further, as we got used to the changing Object IDs in new
releases, we found that some of those objects could be isolated into
one script and taken out of multiple scripts in which the other
objects did not change. One example was a double-click on an object
that brought up a detailed screen to do various functions. Since the
double-clicked object ID always seemed to change, and the functions
on the detailed screens rarely required ID changes, the double-click
was put into a separate rapid script.
This breaking up of rapid scripts into smaller scripts was a
massive undertaking. It was anybody's guess what SHELL scripts
contained which rapid scripts. A program was developed to go through
each SHELL, check for the old rapid script, and input it into the
new rapid script. But in some cases, that required multiple changes
that could not be easily programmed, so each SHELL had to be
reviewed.
Verification of the functional test was another revised area.
SHELL scripts usually ended the list of rapid scripts with one rapid
script that verified the result of the functional test. After one
particularly large release requiring major ID modifications, we
noted that most verification scripts were used by only one SHELL
script. Since the idea behind rapid scripts was to re-use them over
and over in many scripts, verification scripts did not meet the
criteria. In fact, they actually made the correction of object IDs
harder: The user was required to go to a second script for every
SHELL script verification. Therefore, verification of the functional
SHELL script was coded into the end of the SHELL and deleted as a
rapid script. This meant the SHELL script was not as clean and easy
for a layperson to read.
As the programmer became more familiar with the application, she
discovered that many of the exact same dialogue boxes popped up in
various areas of the AUT. These dialogue boxes had their own title
bar, which did not change regardless of the area from which it was
accessed. Since all rapid scripts were named by area, this meant
that these dialogues were being programmed over and over again at
least one rapid script for each area. So we developed a new area and
named it "U" for universal rapid scripts. So the script names in
this area started with U, followed by the title bar name of the
dialogue box. This cut down on the number of redundant scripts and
is continuing to do so.
Multiple Environments and Releases
As the weekly regression test run began to prove itself valuable,
management wanted to increase leverage for the automation by testing
hot fixes that came out in between releases of the AUT. This was a
big hurdle to overcome on two fronts.
First, the environment to test the hot fixes needed to be a
separate database reflecting what was being used in the field. The
future release we tested every week had a database with future
changes; the release being used in the field did not have this
database.
Second, new builds were being put into the future release daily.
This new code introduced Object ID changes that would not be in the
hot fix code. Therefore, some Object IDs in rapid scripts might
vary, depending on whether they were running on the future code
under test or the hot fix code.
Until this point, we had been able to make only one global
variable: the tracking name of each test that was passed from rapid
script to rapid script within a SHELL script. Now, it looked as if
we needed at least one more global variable to indicate what release
the scripts were to test.
Expanding Automation
Automated testing had only one programmer, who worked with the
testers developing new tests three days a week, and then ran
regression tests the last two days of the week. The thought was that
the new build would first be checked by the automated regression
test to see if any of the new code broke existing code in the AUT.
Then, the testers could start their week knowing that basic code was
unchanged; or, in the worst-case scenario, they would know what
didn't need to be tested, because it had to be re-worked.
By this time, the regression test consisted of more than 300
SHELL tests running on eight machines. The two days of regression
testing now took place within a fairly consistent format. The first
eight to twelve hours consisted of keeping all the machines running.
There were still crashes, especially in areas where the most
features were being developed. The programmer would simply note the
failed script and re-start the Master SHELL on the next script. When
all scripts were done, it was a matter of going back to the list of
failed scripts and re-running them to determine the cause: Was it an
automated script failure or an AUT failure? We developed a
spreadsheet listing all the automated SHELL scripts and sent it out,
noting the Pass/Fail results of the regression test. If the SHELL
was marked "Fail," that indicated that there might be an AUT
failure, and the testers could research the problem. Sometimes
research meant watching the SHELL run, or working manually
referencing the list of rapid scripts in the SHELL.
The introduction of a second environment and the need to maintain
alternating code required another person, but there was much trial
and error to determine what skills that person needed. This form of
programming was generally too extreme for anyone with previous
programming experience. Not because it was too complex, though in
fact, for just the opposite reason.
Most programmers use coding devices such as code libraries,
global variables, arrays, loops, go to and function statements to
create an application. But our approach to automated programming
discourages this type of coding in lieu of descriptions of clear-cut
actions that were completed to achieve the test. Unlike normal
programming code, which is not readily understood by
non-programmers, and requires a great deal of documentation for
maintenance, these descriptions are written in plain language as the
script name. Obviously, a programmer, or even an aspiring
programmer, would not be long for this job.
The ideal candidate was a tester willing to leverage his or her
testing experience work with our basic programming technique. Our
initial need was to run regression tests and learn to analyze them.
We knew the analysis would take twice as long now that we were
running in two environments, and testing both future code and hot
fixes. The important part of analysis would be understanding what
the functional test was supposed to do. Every time a script failed,
we would have to determine whether the problem was the AUT or
whether the script did not adapt to the release. A future release
might very well have a new feature that the hot fix did not have.
When the automated code was looking for a different object ID or
data, an IF statement would be needed to determine if the script
should read for future code/data or for hot fix code/data.
Much Accomplished, More to Do
We have been using this rapid test automation approach for over a
year and a half; currently, we have more than 500 SHELL scripts and
1,200 rapid scripts. We have covered most areas of the application
at a high level, but we must do a great deal more to complete those
areas.
Regression test time has not increased significantly with this
growth in the number of tests. More and more of the test areas are
becoming stable, with few if any new changes. Also, as the AUT
improved, the accessing of the Customer PO or Supplier Reference
Number became more prevalent throughout the system. This is the same
field used by our SHELL scripts to identify the test, so our scripts
were able to access orders for validations more efficiently. AUT
improvements have eliminated much of the previous test set-up and
test name searching.
Scripting might have gone more smoothly if all team members had
more strictly observed the standards for the selectors and hot keys.
As with most projects under deadline, team members often treated
these standards as enhancements, and generally gave them a back
seat. Once management started seeing the benefits of test
automation, however, we began enforcing the standards more
rigorously, because some areas could not be scripted without them.
As time has gone by, the testers have begun to rely more on
automation to do initial runs in the areas they test. Conversely,
they seem to have even less time to develop new automated scripts.
As a half measure, we require manual test documentation on each new
bug fix the testers finish testing. This new test documentation is
then used to create new SHELL scripts from, or to add to, existing
SHELL scripts.
In addition, management was so sold on the advantages of test
automation that they created a closed environment devoted strictly
to an automated balancing of the general ledger account. This
ensured data integrity and allowed for resulting data to be cleared
for each new test cycle. A manual tester who had been doing general
ledger testing was able to build on initial SHELL scripts in the
accounting areas to create a full-blown accounting regression test.
As the AUT nears completion, interest has focused more and more
on system performance and using automated scripts. Engineers can now
pinpoint areas and actions that have performance problems, and the
automation tester can pull similar SHELL scripts and adapt them to
run so that system engineers can gather and analyze performance
statistics.
As noted earlier, we have completed installation of the AUT at
all eighty sites and are now in the final stages of fixes and
features. For everyone involved in the project, the stakes are now
higher than ever: For the client's management, a flawed release can
cause widespread problems; for my employer, problems could
negatively impact the client's final impression of our company; and
the test team is concerned because the AUT is so complex and
integrated that one small change can affect innumerable areas.
To reduce all of these risks, the test team is now requiring
regression tests prior to release of hot fixes as well as the
regular releases. They know there is no way their team can manually
test all the areas that are covered in the regression test and still
remain on schedule. For hot fixes the time is even more critical,
and a full regression test is not possible or practical. So instead,
when a hot fix is built for release, the team determines the areas
the fix impacts and then selects the corresponding automated SHELL
scripts to run.
Since we hired another full-time automation tester, we are able
to manage unscheduled hot fix testing (which takes up to a day or
two in addition to maintaining two environments) as well as
regular regression tests and the scripting of new SHELL scripts.
Over the last three weeks we have been able to do regression tests
and a hot fix, and still script eighty new SHELL scripts.
About the author  | |  | Penny Witt has worked as a consultant for over fifteen years, devoting the past eight to using automated test tools. Her preference for Rational Robot began in 1994, when the US Customs Department requested an evaluation of the top four test tools. She has not changed her preference since that time. |
Rate this page
|