Strategies for refactoring untestable PHP code

Unit test and refactor legacy PHP code to make testing easier and improve code quality

With the growth of PHP from a simple scripting language to a full-fledged programming language, there has been a parallel growth in the complexity of the code bases of a typical PHP application. To control support and maintenance of these applications, various testing tools help automate this process. One method, unit testing, allows you to test the code you write directly for correctness. However, often legacy code bases aren't adaptable to this kind of testing. This article looks at strategies for refactoring common problematic PHP code to make it easier to test using popular unit testing tools, while reducing dependencies that improves your code base.

John Mertic (jmertic@gmail.com), Software Engineer, SugarCRM

author photoJohn Mertic is a software engineer at SugarCRM and has several years of experience with PHP web applications. At SugarCRM, he has specialized in data integration, mobile, and user interface architecture. An avid writer, he has been published in php|architect, IBM developerWorks, and in the Apple Developer Connector, and is the author of the book "The Definitive Guide to SugarCRM: Better Business Applications." He has also contributed to many open source projects, most notably the PHP project, where he is the creator and maintainer of the PHP Windows Installer.



07 June 2011

Also available in Chinese Russian Japanese Spanish

Introduction

Looking back on the 15 years of PHP, we see that is has grown from a simple, dynamic, scripting-language alternative to CGI scripts that were popular during that time, to the full-fledged programming language that it is today. As the code base grows, manual testing becomes an impossible task, and every code change made, big or small, could affect the entire application. The effects may be as simple as a page not loading or a form not saving, or could also be something hard to detect or only shows up under certain circumstances. It could even cause a previous issue to reappear in the application. Various testing tools have been developed to solve these problems.

One popular method is known as functional or acceptance testing, which tests the application through the typical user interaction of the application. This is a good technique for testing the various processes in the application, but can be a very slow process and generally doesn't do as good of a job of testing the low level classes and functions to make sure that they are working as intended. This is where another method of testing, unit testing, comes into play. The goal is to test the functionality of the underlying code of the application to ensure that the correct results are provided upon execution. Often, these "grown up" web applications gain a lot of legacy code that over time can become difficult to test, which reduces the ability for development teams to provide good testing coverage of an application. This is commonly referred to as "untestable code". Let's look at how to identify this in your application and how to fix it.


Identifying untestable code

The problem areas of your code base that are untestable often are not apparent when the code is written. When you write code for a PHP application, there is a tendency to tailor it to how the web request flows, often taking a more procedural approach to the application design. An urgency to finish the project or fixes to the application in a hurry can cause developers to "cut corners" to complete the code quickly. Previously, poorly written or confusing code can compound the untestability issues in an application, because developers will often try to do the least risky fix possible even it compounds support issues down the road. These problem areas are all very untestable by unit testing tools without taking great measures.


Functions depending on the global state

Global variables are a convenience in PHP applications. They allow you to have variables or objects that can be initialized early in your application, and then can be leveraged anywhere in the application. However, that flexibility comes at a cost, as the heavy use of global variables is a common problem seen in untestable code. We can see this in Listing 1.

Listing 1. Function that depends upon the global state
<?php 
function formatNumber($number) 
{ 
    global $decimal_precision, $decimal_separator, $thousands_separator; 
     
    if ( !isset($decimal_precision) ) $decimal_precision = 2; 
    if ( !isset($decimal_separator) ) $decimal_separator = '.'; 
    if ( !isset($thousands_separator) ) $thousands_separator = ','; 
     
    return number_format($number, $decimal_precision, $decimal_separator, 
$thousands_separator); 
}

Two different issues arise because of these global variables. The first issue is that you need to account for each of them in your test, making sure you set them to valid values that the function expects. The second and bigger issue is so that you don't change the state on subsequent tests and invalidate their results you need to make sure that you reset the global state back to how it was before the test was run. PHPUnit has facilities that can backup global variables and restore them after your test is run, which can help alleviate the issue. However, a better approach is to provide a way for the tester class to directly pass in values for these globals that the method can use. Listing 2 shows an example of how to do this.

Listing 2. Fixed this function to allow overriding the global variables
<?php 
function formatNumber($number, $decimal_precision = null, $decimal_separator = null, 
$thousands_separator = null) 
{ 
    if ( is_null($decimal_precision) ) global $decimal_precision; 
    if ( is_null($decimal_separator) ) global $decimal_separator; 
    if ( is_null($thousands_separator) ) global $thousands_separator; 
     
    if ( !isset($decimal_precision) ) $decimal_precision = 2; 
    if ( !isset($decimal_separator) ) $decimal_separator = '.'; 
    if ( !isset($thousands_separator) ) $thousands_separator = ','; 
     
    return number_format($number, $decimal_precision, $decimal_separator, 
$thousands_separator);
}

Doing this not only has made the code more testable, but also made it not depend on the global variables in the method. This can open up the possibility of refactoring this code down the road to not use the global variables at all.


Singletons that can't be reset

Singletons are classes that are designed to have only one instance existing at a time in an application. They are a common pattern used for global objects in an application, such as database connections and configuration settings. They are often considered taboo in an application, which many developers consider to be an undeserving distinction, because of the usefulness of having an always available object for use. Much of this comes from the overuse of singletons, where many of these so called god objects can be impossible to extend from. But from a testing perspective, a big problem is that they are often immutable. Let's look at Listing 3 as an example.

Listing 3. Singleton object we are looking to test
<?php 
class Singleton 
{ 
    private static $instance; 
     
    protected function __construct() { } 
    private final function __clone() {} 
     
     
    public static function getInstance() 
    { 
        if ( !isset(self::$instance) ) { 
            self::$instance = new Singleton; 
        } 
         
        return self::$instance; 
    } 
}

So you can see that after the singleton is instantiated the first time, every call made to the getInstance() method will return back the same object and not a new one, which can become a big problem if we make changes to that object. The easiest solution is to add a method to the object that can reset it. Listing 4 shows such an example.

Listing 4. Singleton object with a reset method added
<?php 
class Singleton 
{ 
    private static $instance; 
     
    protected function __construct() { } 
    private final function __clone() {} 
     
     
    public static function getInstance() 
    { 
        if ( !isset(self::$instance) ) { 
            self::$instance = new Singleton; 
        } 
         
        return self::$instance; 
    } 
     
    public static function reset() 
    { 
        self::$instance = null; 
    } 
}

Now, we can call the reset method to start off each test run to ensure we are going through the initialization code for the singleton object on every test run. Having this method available can be helpful in the application in general because now the singleton becomes easily mutable.


Working in the class constructor

A good practice when unit testing is to only test exactly what you are intending to, and avoid having to setup more objects and variables than you absolutely need. Every object and variable you set is also one you need to remove after the fact. This becomes a problem for more pesky items such as files and database tables, where if you need to modify the state you must be extra careful to cleanup your tracks after the test is completed. The biggest barrier to keeping that rule intact is the constructor of the object itself, which does all sorts of things that aren't pertinent to your test. Consider Listing 5 below for an example.

Listing 5. Class with a large singleton method
<?php 
class MyClass 
{ 
    protected $results; 
     
    public function __construct() 
    { 
        $dbconn = new DatabaseConnection('localhost','user','password'); 
        $this->results = $dbconn->query('select name from mytable'); 
    } 
     
    public function getFirstResult() 
    { 
        return $this->results[0]; 
    } 
}

Here, in order to test the fdfdfd method in the object, we end up needing to setup a database connection, have records in the table, and then clean up all these resources after the fact. This seems like overkill when none of this is needed to test the fdfdfd method. Therefore, let's modify the constructor as shown in Listing 6.

Listing 6. Class modified to optionally skip all the unneeded initialization logic
<?php 
class MyClass 
{ 
    protected $results; 
     
    public function __construct($init = true) 
    { 
        if ( $init ) $this->init(); 
    } 
     
    public function init() 
    { 
        $dbconn = new DatabaseConnection('localhost','user','password'); 
        $this->results = $dbconn->query('select name from mytable');
    } 
     
    public function getFirstResult() 
    { 
        return $this->results[0]; 
    } 
}

We've refactored the large amount of code in the constructor to put it in an init() method, which still will be called by default in the constructor to avoid breaking any existing code. However, now we can just pass a boolean false to the constructor during our tests to avoid calling the init() method and all the unneeded initialization logic. This refactoring of the class also improves the code so that we have separated the initialization code from the object construction code.


Having hard coded class dependencies

As we saw in the previous section, a huge problem in class design that makes testing difficult is having to initialize all sorts of objects that aren't required for your test. Previously, we saw how heavy initialization logic can add all sorts of overhead into writing a test (especially when the test doesn't need any of this to succeed), but another problem can occur when we directly create new objects inside methods of the class we may be testing. Let's look at Listing 7 for an example of such problematic code.

Listing 7. Class that has a method that directly initializes another object
<?php 
class MyUserClass 
{ 
    public function getUserList() 
    { 
        $dbconn = new DatabaseConnection('localhost','user','password'); 
        $results = $dbconn->query('select name from user'); 
         
        sort($results); 
         
        return $results; 
    } 
}

Let's say we are testing the getUserList method above, but the focus of our test is to make sure that the returned user list is properly sorted alphabetically. In this case the fact that we could grab the records from the database really doesn't matter, since what we would be testing is our ability to sort the records coming back. The problem is that since we directly instantiate a database connection object inside the method, we are required to do all this scaffolding work to properly test the method. Therefore, let's make a change to allow for an object to be interjected, as shown in Listing 8.

Listing 8. Class that has a method that directly initializes another object, but also provides a way to override it
<?php 
class MyUserClass 
{ 
    public function getUserList($dbconn = null) 
    { 
        if ( !isset($dbconn) || !( $dbconn instanceOf DatabaseConnection ) ) { 
            $dbconn = new DatabaseConnection('localhost','user','password'); 
        } 
        $results = $dbconn->query('select name from user'); 
         
        sort($results); 
         
        return $results; 
    } 
}

Now you can pass in an object directly that is compatible with the expected database connection object, and it will use that object instead of creating a new one. The object you passed can simply be a mock object, meaning one where we hard code several of the return values of called methods to directly provide back the data we wish to use. In this case, we would mock the query method of the database connection object so that we could just return back the results instead of calling out to the database for them. Doing this kind of refactoring can also can improve the method, allowing your application to interject different database connections instead of being tied only to the default one specified.


Gains from testable code

Certainly writing more testable code has the obvious gain of making it easier to write unit tests for your PHP application (which you've seen in the examples presented in this article), but it also creates a better designed, more modular, and more stable application in the process. We've all seen various levels of "spaghetti" code that tightly intertwines business and display logic into one big procedural mess in PHP applications, which undoubtedly can cause support nightmares for anyone having to delve into it. In the process of making code testable, we've refactored previously problematic code; not only problematic in it's design but in it's function. We've opened up options for better code reuse, by making the functions and classes less single purpose and more usable by other areas of the application by removing hard coded dependencies. Additionally, we made it easier for future support of the code base by removing the poor quality code and replacing it with much better quality code.


Conclusion

In this article, we looked at making PHP code more testable thru several examples of classically found untestable code in PHP applications. We explored how the situations arose in the application, and then saw how to best fix the problematic code to make testing possible. We also saw how making these changes to your code not only made the code more testable, but also improved the code quality in general and promoted code reuse in the refactored code sections.


Download

DescriptionNameSize
Article source codecode_examples.zip3KB

Resources

Learn

Get products and technologies

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, XML
ArticleID=678036
ArticleTitle=Strategies for refactoring untestable PHP code
publish-date=06072011