Moving a PHP application to DB2 from MySQL, Part 3: Convert your PHP code

Lessons from an IBM intranet application case study

Learn why to move a PHP application to DB2®, how to plan the migration, how to execute it, how to support it, and how to handle potential risks based on the experience of an IBM intranet application case study. This four-part series shares lessons from a successful MySQL-to-DB2 migration for a mission-critical PHP intranet application used by 4,000 global users within IBM to support content production for ibm.com. Part 3 describes the steps taken to convert the PHP code.

Share:

Daniel Krook (krook@us.ibm.com), Software Engineer, IBM

Daniel Krook is an IBM/Open Group Master Certified IT Specialist based in the greater New York City area. He has over ten years experience in web application development and currently builds a Cloud infrastructure for IBM using Java EE, DB2, REST, and mobile technologies. He holds certifications in PHP, Java EE, BlackBerry, DB2, and Solaris. He writes PHP-related articles for IBM developerWorks and co-authored the IBM Redbook "Developing PHP Applications for IBM Data Servers."


developerWorks Contributing author
        level

Yan Li Mu (yanlimu@cn.ibm.com), IT Architect, I.B.M.

Yan Li Mu is an IT Architect working in Dalian, China. He has over eight years of experience in web application development with a focus on Java EE technologies, PHP, and database development.



Mark Nusekabel (zedot@us.ibm.com), Senior IT Architect, I.B.M.

Mark Nusekabel is an IBM/Open Group Master Certified IT Architect living in the Tampa Bay, Florida area. He has over 20 years of experience in information technology, and he currently architects internal tools using JavaScript, PHP, Java, and DB2. He holds certifications in e-business solutions, Java, and XML.



01 September 2011

Also available in Chinese Russian Japanese Vietnamese

Introduction to the series

MySQL is currently the most common database server used with the PHP programming language to build dynamic web applications. However, DB2 is another popular database that is well-supported by PHP and offers compelling advantages over MySQL that make it an ideal choice for many applications.

This series describes why it makes sense to move a PHP application to DB2, how to prepare for the migration, how to execute it, how to support it, and how to handle potential risks based on the authors' experience with a recent migration. Many code and configuration samples are provided, as well as pointers to resources to help the project run smoothly.

With examples and lessons learned from a successful real-life conversion, you will see that this can be a straightforward project that is well documented and offers compelling benefits.

This four-part series shares lessons learned from a successful MySQL to DB2 migration for a production-grade, mission-critical PHP intranet application used by 4,000 global users within IBM to support content production for ibm.com.

  • Part 1 describes the steps taken to prepare for the migration.
  • Part 2 describes the steps taken to migrate the database.
  • Part 3 describes the steps taken to convert the PHP code.
  • Part 4 describes the steps taken to deploy and support the application.

What you'll learn

The goal of this article series is to provide you with an understanding of what is generally needed to migrate a PHP application from MySQL to DB2, what resources are available to help you, and how an IBM project team carried out the task in early 2010.

If you have investigated a migration from MySQL to DB2, you might have already seen the value DB2 offers based on product literature, performance benchmarks, features you have read about in the DB2 documentation, or comparisons in IBM Redbooks ® dedicated to the task, including the MySQL to DB2 Conversion Guide (see Resources ).

You might also know that DB2 Express-C is a free, fully functional relational data server that can be easily installed or evaluated using the IBM SmartCloud (formerly known as the Development and Test on the IBM Cloud) or Amazon EC2. Links to these offerings are available in the Resources section.

This article series provides you a concrete example of how a real world migration was executed successfully in 2010 for a heavily used PHP intranet application within IBM to support the day-to-day management of content that is published for many sections of the ibm.com web site.

After you finish reading this series, you will be able to make the case for a similar migration, understand the timing and dependencies of work items that need to be executed, anticipate potential risks, and know where to look for support each step of the way. All of this will give you more confidence to choose DB2 and make the best use of it for your PHP applications that are currently built on MySQL.

What is not covered

This article series intends to share the lessons learned from an internal IBM migration to DB2 from MySQL and provide you with information on the resources available to you to perform a similar undertaking. It is not a comprehensive guide to migration that is applicable to all scenarios.

To determine an approach that's right for you, refer to the MySQL to DB2 Conversion Guide or contact the Software Migration Project Office (SMPO) for a free migration estimate. Links are provided in the Resources section.


Introduction to the code migration

This article covers the five primary work efforts in the case study to carry out the migration of the PHP application code from using MySQL drivers and syntax to those that work with DB2. If needed, refer to Part 1 of the series to see the scheduling of these steps in the overall migration process before proceeding with the conversion.

Step 1: Begin a first pass at the code migration
  • Ensure that the PHP configuration is updated to support DB2
  • Update the individual SQL statements to support DB2 syntax
  • Emulate native MySQL functions using DB2 user-defined functions, if needed
  • Move logic into PHP from SQL, if needed
Step 2: Begin a second pass at the code migration
  • Examine the changes needed to support the appropriate isolation levels
  • Reorganize the queries into logical units of work to achieve better data integrity and improve performance
Step 3: Begin initial business use case testing with stakeholders
  • Engage stakeholders to execute use cases that they are accustomed to performing in the old system
  • Capture test failures as defects to be analyzed and fixed by developers
Step 4: Address bottlenecks and confirm against functional baseline
  • Improve the performance of the system based on user feedback after functional verification
  • Focus on what DB2 can fix for you automatically, because DB2 is the largest change you introduced
  • Address PHP bottlenecks by looking at operating system resource usage
Step 5: Evaluate the code migration baseline
  • After iteratively completing each of the steps above, declare the code conversion complete
  • Back up the system, and tag the milestone in your source control system
  • Assess the level of preparation needed for the next task: application deployment

Understanding the existing PTT PHP codebase example case study

A reminder

If needed, reacquaint yourself with the Resources that might be helpful during your migration. The following resources might be especially useful for this step:

  • Chapters 8 and 10 in the free IBM Redbook ®MySQL to DB2 Conversion Guide
  • Chapters 4 and 6 in the free IBM Redbook Developing PHP Applications for IBM Data Servers
  • The developerWorks article "Recommended reading list: DB2 for Linux, UNIX, and Windows application development"
  • Daniel Krook's blog entries about his personal experience in another migration project for an ISV

Of course, the PHP programming language and DB2 driver extension documentation were also instrumental.

Another option is to leverage the cloud for the migration process. You can use the Amazon EC2 Linux and DB2 AMIs, or sign up for the IBM SmartCloud (formerly known as the Development and Test on the IBM Cloud) (see Resources ).

For the example application in this article, the source code for the Project Tracking Tool (PTT) consists of several hundred PHP files. The codebase contains function libraries, object-oriented code organized into data transfer objects and manager classes, and various HTML template fragments and helpers that render the user interface.

The PTT database is used for various functions to support the workflow of information published on ibm.com. Over 4,000 users worldwide access and modify the PTT database through the PHP web front end. At any given time, several hundred users are active on the system concurrently.

The code is deployed onto a single Apache web server that loads mod_php as a shared module.

In the example, the existing code was updated for the new DB2 system, primarily by making changes to the embedded SQL. Other minor modifications were required in order to update the PHP configuration to use the DB2 driver and to tweak the database connection code to use a new connection string. This article also identifies and addresses some improvements to the application structure to better achieve a model-view-controller (MVC) architectural model to improve application quality, organization, and maintainability.


Installing the conversion software

To prepare for the example code conversion, install the following components on a Windows workstation to be used to carry out the conversion steps.

A version of MySQL copy of the source database
In order to verify each change to the code, it's important to have a copy of the old system available to use as a functional reference in order to validate changes to the new system side-by-side with the old. You can use the same system as you did in Part 2.
A version of DB2 with the data server drivers installed locally or on a test server
Install DB2 to create the new target database on the workstation. In general, this does not need to be the same edition as the database used in production, but for full feature compatibility, it is a good idea. To follow the example in this article, install DB2 Enterprise Server Edition Version 9.7.2. Ensure that the data server drivers are available to provide the necessary PHP client libraries. You can use the same system as you did in Part 2.
A version of PHP built with the ibm_db2 extension or PDO_IBM (PHP Data Objects)
Download the latest version of Zend Server, and select the optional DB2 extension, which requires the installation of additional packages.
An IDE, such as Zend Studio or Eclipse PHP Development Tools (PDT)
Use a PHP-aware IDE, such as Zend Studio or the Eclipse PHP Development Tools, in order to ease your PHP development. Because you maintain a PHP application, it's likely you already have a preferred tool.

Be sure to document your configuration decisions and lessons-learned carefully so that you can repeat the steps when you deploy. Consider saving a snapshot of the Windows operating system to a virtual image at key milestones when important goals are achieved in order to serve as configuration backups and baselines for further code improvement comparisons.

If you want to capture an image of a physical machine configuration, you can do so with the free VMware vCenter Converter. As an alternative, consider using the cloud for these hands-on changes. You can use the Amazon EC2 DB2 AMIs, or you can sign up for the IBM SmartCloud (formerly known as the Development and Test on the IBM Cloud). With virtual computers, you'll be able to avoid the initial effort to procure server hardware and install an operating system and DB2, which will save time, speed the migration process, and give you more confidence to experiment with the configuration that best fits your needs. Links to all these products can be found in Resources.


Step 1: Begin a first pass at the code migration

The first major code conversion step is to configure the new PHP and DB2 infrastructure and translate the way that the application uses PHP and SQL to access data. This step involves the following substeps:

Update the PHP driver and configuration

Ensure that the existing PHP configuration is modified to support the DB2 driver, to match the server database encoding, and to provide important diagnostic messages via verbose warning and error notifications.

In order to update the PHP code for DB2, you will normally change each spot in the code where the ibm_db2 extension functions are used, or update the PDO_IBM connection string that implements the PDO (PHP Data Objects) interface. If you use PDO as a database abstraction library in your MySQL-based system, the changes to your code should be minor. It is a matter of changing the single connection string. But even if you are using the procedural ibm_db2 extension, this task should be straightforward. You can normally replace any functions that start with mysql_ or mysqli_ in your code with db2_ prefixed equivalents. See section 6.3 in Chapter 6 of the IBM Redbook Developing PHP Applications for IBM Data Servers (see Resources) for function mapping tables and sample code.

Both connection methods rely on the DB2 client as a bridge to connect to the DB2 server, so you need to confirm that the DB2 client has been configured to match the settings on the remote server. In particular, you should make sure the code page of the DB2 client is the same as that of the DB2 server; otherwise, you will have trouble with character sets. For the example, run the command shown in Listing 1 to set the client encoding to UTF-8, which will match the CREATE DATABASE command in Listing 1 in Part 2.

Listing 1. Set the DB2 client code page to UTF-8
db2set db2codepage=1208

To ensure this setting is in place on each of the workstations or servers where you install the runtime client, run the command in Listing 2.

Listing 2. View the DB2 configuration variables
db2set -all

Finally, during development it is helpful to ensure that all error messages and warnings are captured and shown in the browser. In your php.ini file, update the parameters shown in Listing 3. Remember to set them back to less verbose settings when you move to production. Part 4 of this articles series provides an example of a more appropriate error-reporting mechanism to use when you deploy to production.

Listing 3. Set error control parameters in php.ini
display_errors = On
error_reporting = E_ALL & ~E_NOTICE

Convert SQL syntax

Next, update the individual SQL statements to support DB2 syntax, or alter them to access data in a slightly different way that achieves the same result. For most applications, this step accounts for the bulk of the code updates and verification testing. Refer to Chapter 8.1 Data Manipulation Language differences and similarities in the Redbook MySQL to DB2 Conversion Guide to see a list of common syntax conversions. Following are a few more situations in the case study application.

SELECT clause wildcard behavior
SELECT statements that use a wildcard (the asterisk character) to choose specific columns in addition to all columns from multiple tables are legal in MySQL, but they are not allowed in DB2. In this case, add the table name (or alias) qualifier prefix to the wildcard, specify each column from each table you are interested in, or simply list all columns in each table with a single wildcard. Listing 4 shows how to modify such queries.
Listing 4. Comparing the legal column and table wildcards in MySQL and DB2
-- In MySQL: 
SELECT R.NAME, * 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID;

-- In DB2: 
SELECT R.NAME, U.* 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID;

-- Or
SELECT R.NAME, U.ID, U.NAME, U.ROLE_ID, U.DEPART_ID 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID;
SELECT clause specifying the maximum result set size with LIMIT
LIMIT is a MySQL-only, non-standard keyword that specifies the maximum number of rows to return from a query. DB2 uses the FETCH FIRST n ROWS ONLY syntax for the same purpose. Listing 5 shows you how to rewrite the query for DB2.
Listing 5. Specifying the maximum number of rows to return from a query in MySQL and DB2
-- In MySQL: 
SELECT * FROM ROLE LIMIT 10;

-- In DB2: 
SELECT * FROM ROLE FETCH FIRST 10 ROWS ONLY;
GROUP BY clause
MySQL allows the use of a GROUP BY clause without specifying each of the column names not operated by on an aggregate function in the column list. But in DB2, this is illegal, because the resultset can contain a degree of ambiguity unacceptable to many users. Listing 6 shows how to explicitly specify the columns in the query for DB2.
Listing 6. Comparing the legal GROUP BY aggregations in MySQL and DB2
-- In MySQL: 
SELECT R.ID, R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.ID;
				
-- In DB2: 
SELECT R.ID, MIN(R.NAME), COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.ID;

-- Or
SELECT R.ID, R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.ID, R.NAME;

In Listing 6, you can update the GROUP BY, because both R.ID and R.NAME are unique keys in this table. Listing 7 shows another kind of GROUP BY query that can't be handled as simply as the one in Listing 6.

Listing 7. Ambiguous GROUP BY in MySQL that needs to be modified for DB2
-- In MySQL: 
SELECT R.ID, R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.NAME;

In this case, R.ID is unique within the table, but R.NAME is not. If there are duplicate R.NAME values in the ROLE table, then you can't replace R.ID with MIN(R.ID), and you can't add R.ID into a GROUP BY clause. How you convert this SQL depends on what kind of result you expect. Listing 8 shows a few options for this situation.

Listing 8. Translated GROUP BY query in DB2
-- In DB2: 
-- Option 1, if you want to get same result as what you get in MySQL, the SQL is: 
SELECT (SELECT RL.ID FROM ROLE RL WHERE RL.NAME = R.NAME FETCH FIRST 1 ROW ONLY) AS ID,
R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.NAME;
				
-- Option 2, if role name is same, treat it as same role. In this case, 
-- the result is a little different from MySQL version in that there is no R.ID:
SELECT R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.NAME;
				
-- Option 3, if role ID is different, treat them as different role. 
-- In this case, the result is totally different than the MySQL version:
SELECT R.ID, R.NAME, COUNT(U.ID) AS NUM 
FROM USER U, ROLE R 
WHERE U.ROLE_ID = R.ID 
GROUP BY R.ID, R.NAME;
REPLACE INTO in MySQL versus MERGE in DB2
MySQL provides a REPLACE INTO clause. DB2 provides a MERGE clause for a similar, but not equivalent, purpose. To achieve the equivalent result, create a new record or replace an existing one if it has the same primary key or unique value. Listing 9 shows how to check for a unique value in the table and either update or insert values.
Listing 9. Translating MySQL REPLACE INTO syntax for DB2
-- In MySQL: 
REPLACE INTO ROLE (ID, NAME, DESCRIPTION)
SELECT ID, NAME, DESCRIPTION 
FROM ROLE_TMP;

-- In DB2: 
MERGE INTO ROLE R
USING (SELECT ID, NAME, DESCRIPTION FROM ROLE_TMP) RT
ON (R.ID = RT.ID)
WHEN MATCHED THEN
UPDATE SET (ID, NAME, DESCRIPTION) = (RT.ID, RT.NAME, RT.DESCRIPTION)
WHEN NOT MATCHED THEN
INSERT (ID, NAME, DESCRIPTION) VALUES (RT.ID, RT.NAME, RT.DESCRIPTION);
JOIN clause
When performing an outer join on a pair of tables while also retrieving data from a third table, specify the tables used for the outer join adjacent to the join keywords in DB2. MySQL allows them to be listed in either order. Listing 10 shows the required change.
Listing 10. Comparing MySQL and DB2 JOIN syntax
-- Both of the following work in MySQL: 
SELECT * 
FROM USER U, ROLE R 
LEFT JOIN DEPARTMENT D 
ON U.DEPT_ID = D.ID;

-- Or
SELECT * 
FROM ROLE R, USER U 
LEFT JOIN DEPARTMENT D 
ON U.DEPT_ID = D.ID;

-- Only this one works in DB2: 
SELECT * FROM ROLE R, USER U 
LEFT JOIN DEPARTMENT D 
ON U.DEPT_ID = D.ID;
Escape characters
The character sequence \ represents a backslash escaped single quote ' in MySQL, but in DB2, a single quote must be escaped by adding another single quote '' (double single quotes). Listing 11 shows an example of the escape sequences for a string that uses a single quote as an apostrophe.
Listing 11. MySQL uses the backslash to escape single quotes where DB2 uses a second single quote
-- In MySQL: 
SELECT * FROM ROLE 
WHERE DESCRIPTION = 'It\'s a super admin role';

-- In DB2: 
SELECT * FROM ROLE 
WHERE DESCRIPTION = 'It''s a super admin role';
Out of range checking in DB2
MySQL doesn't enforce the range of a data type in a SELECT clause, but DB2 does. Thus the SQL shown in Listing 12 that queries for information between February 1 and February 30 doesn't work in DB2, but it works in MySQL.
Listing 12. MySQL does not check the range of values for WHERE clauses against the data type
-- MySQL allows you to specify February 30th as part of the range 
SELECT * FROM USER 
WHERE BIRTHDAY BETWEEN '1980-02-01' AND '1980-02-30';
INSERT into a NOT NULL column without a default value
In MySQL, if you try to execute an INSERT statement that does not provide a value for a NOT NULL column, the command is successful. MySQL populates a default value automatically, even if you never define a default value for that column when creating the table. But in DB2, you must provide a value for NOT NULL columns in an INSERT statement when a default hasn't been defined. The SQL statements in Listing 13 show the different behaviors.
Listing 13. MySQL and DB2 handle the insertion of NULL values differently
-- Works in both MySQL and DB2 
CREATE TABLE TEST1 (ID INTEGER, NAME VARCHAR(20) NOT NULL); 

-- Works in MySQL, doesn't work in DB2
INSERT INTO TEST1 (ID) VALUES(1); 

-- Works in both MySQL and DB2 				
CREATE TABLE TEST2 (ID INTEGER, NAME VARCHAR(20) NOT NULL DEFAULT ''); 

-- Works in both MySQL and DB2
INSERT INTO TEST2 (ID) VALUES(1);

Using the new DB2 compatibility modes

While you can emulate some MySQL syntax by using user-defined functions, another new option is to set one of the compatibility modes for versions of DB2 after 9.7.2. Read more about enabling LIMIT and OFFSET syntax on Antonio Cangiano's blog.

Convert MySQL functions to similar functions in DB2
In addition to MySQL-specific SQL, you may also be using built-in MySQL functions. You can replace these with SQL standard functions or an equivalent function available in DB2. Refer to Appendix A: Mapping MySQL built-in functions and operators in the MySQL to DB2 Conversion Guide Redbook (see Resources).

As just described, if you can't find a DB2 equivalent function, you can rewrite your SQL to access data a different way. Alternatively, you can choose to emulate the function in DB2 or move the logic into your PHP code, as described in the following sections.

Create user-defined functions to emulate MySQL features

If you can't just update SQL statements in DB2 syntax to achieve the same results as your MySQL queries, you can implement a workaround in the form of a user-defined function (UDF) in DB2 SQL to emulate a built-in MySQL function.

There were several places that used MySQL and PHP in the example scenario that should be approached differently when updating the application to use DB2. In particular, you can move some functionality out of MySQL into PHP, such as when doing date translation. Or you can emulate function in DB2, such as to create user-defined functions in DB2 to approximate some functions that are built into MySQL, including UNIX_TIMESTAMP() and NOW(). Appendix B in the MySQL to DB2 Conversion Guide Redbook (see Resources) explains the differences, and it gives an approach to addressing changes. Daniel Krook's tips collected from his customer migration experience are also helpful (see Resources).

The DML of DB2 is different from MySQL. This article described how to convert embedded SQLs from MySQL to DB2. However, it is not necessary to convert all syntax from MySQL to DB2, especially for MySQL built-in functions. Appendix A. Mapping MySQL built-in functions and operators in the MySQL to DB2 Conversion Guide Redbook (see Resources) lists lots of MySQL built-in functions. Generally, many of the functions can be converted to DB2 UDFs. You decide which frequently used built-in functions to convert. For the example scenario, it is worth the effort to use DB2 UDFs to convert MySQL functions or registers if you use them frequently. Following is a list of MySQL functions and registers you can convert.

  • CURRENT_DATE() / CURDATE()
  • DATE_FORMAT()
  • DATEDIFF()
  • FROM_UNIXTIME()
  • NOW()
  • PERIOD_DIFF()
  • TO_DAYS()
  • UNIX_TIMESTAMP()
  • WEEKDAY()
  • YEARWEEK()

Fortunately, many of these functions have already been implemented and are freely available to reuse, as shown in the article "DB2 basics: Fun with dates and times" (see Resources). If you are using DB2 v9.7.2 or later, you can opt to set the new compatibility vector DB2_COMPATIBILITY_VECTOR=MYS.

DB2 makes user-defined functions easy, because they can be written in SQL rather than requiring the use of the C programming language, as is the case in MySQL.

Listing 14 is an example of a UDF that you can create to emulate MySQL's non-standard but commonly used NOW() function. You can use the emulation in many places in the example application.

Listing 14. A DB2 UDF definition to emulate the built-in MySQL NOW() function
CREATE FUNCTION NOW()
	RETURNS TIMESTAMP
	NO EXTERNAL ACTION

	BEGIN ATOMIC
		RETURN SELECT CURRENT TIMESTAMP
		FROM SYSIBM.DUAL;
	END

There are several other user-defined functions for emulating commonly used built-in MySQL functions listed in Appendix B of the MySQL to DB2 Conversion Guide Redbook (see Resources).

Move logic to PHP from SQL if necessary

There may be situations where you cannot translate MySQL functions or syntax into a convenient DB2 format. In such cases, look into whether PHP itself supports the feature and use that function or extension instead. For example, MySQL provides an SQL function called INET_ATON() that has no equivalent in DB2. To write a user-defined function for INET_ATON() could be error prone or tedious. Fortunately PHP provides the ip2long() function that offers the same functionality and is not dependent on the database vendor.


Step 2. Begin a second pass at the code migration

With the PHP code updated to modify the embedded SQL statements and to read or update the data in a manner similar to MySQL, the next step is to refactor your queries to achieve the non-functional goals of better data integrity and improved performance. This step involves the following substeps:

Select the appropriate approach to concurrency

Once the individual queries embedded in the MySQL-based application are converted to work on DB2 in an equivalent fashion, it's time to consider how they can be improved to take advantage of the DB2 concurrency and data integrity features. Examine the changes needed to support the appropriate isolation levels between several simultaneous connections to the database.

There are four isolation levels in DB2 to control concurrency. Isolation levels determine how a transaction hides in-progress data changes from other users. The isolation levels are as follows:

Repeatable read isolation level (RR)
All rows are locked until the end of the transaction.
Read stability isolation level (RS)
Rows qualifying the predicate condition are locked until the end of the transaction.
Cursor stability isolation level (CS)
Only those rows in which the cursor is positioned are locked. This is the default.
Uncommitted read isolation level (UR)
No rows are locked unless the data is changing.

Consider which isolation level is appropriate for your application from a performance and data integrity perspective. The concurrency decreases and data integrity increases when you move the isolation level from uncommitted read (UR) to repeatable read (RR). The RR isolation level ensures maximum data integrity at the cost of high performance. Conversely, UR ensures the highest level of performance at the cost of some data inconsistency. Figure 1 illustrates the relationship between performance and data integrity.

Figure 1. Isolation levels in DB2
Diagram with RR, RS, CS, and UR blocks vertically descending. Data integrity scale is highest at RR. Performance scale is highest at UR

You can specify the isolation level at different levels, from individual queries up through connections and the session. If you have a read-heavy workload, you might want to set isolation to uncommitted read (UR) at the connection level so that you don't have to specify it in each statement. If you have a mix of reads and writes, you may want to focus on modifying individual queries rather than changing from the default level, cursor stability (CS).

You might also want to consider the type of cursor you should use to optimize data access. Both the ibm_db2 and PDO_IBM drivers support two types of cursors: forward-only cursors and scrollable cursors. The default is the forward-only cursor. Generally speaking, the default is a good choice for the vast majority of use cases, in which case you iterate sequentially through a result set. In terms of performance, the forward-only cursor is better than the scrollable cursor. However, in some cases, you will need to read cursors both forward and backward, in which case you should use scrollable cursor. In the example scenario, the cursor type was set using the commands in Listing 15 for each necessary query.

Listing 15. Defining the DB2 scroll type from the ibm_db2 functions and PDO driver
-- For ibm_db2
db2_exec($connection_resource, $sql, array('cursor' => DB2_SCROLLABLE));

-- For PDO_IBM
$DB_PDO->prepare($sql, array(PDO::ATTR_CURSOR, PDO::CURSOR_SCROLL));

See section 4.2 Using PHP with DB2 database in the Developing PHP Applications for IBM Data Servers Redbook (see Resources) for more details.

Consolidate queries to improve data quality and performance

Next, group queries into logical units rather than independent queries. In this pass at the code migration, once you know that the SQL statements are receiving the same data and that the statements are modifying the data as they had in the old application, consider regrouping the data access queries and updates. The goal is to match the performance level of MySQL while ensuring the data integrity level of DB2.

Narrow the proportion of database connections to statements
In general, it's faster to build a connection to a database in MySQL than it is in DB2. Therefore reducing the number of total connections per HTTP request by reusing a connection for more than one query improves processor, disk, and network performance. For the example scenario, queries were consolidated, including getting user account information to perform more extensive joins up front (rather than using several follow-on lookup queries), as shown in Listing 16. This might come at the expense of pre-fetching too much data, but for the example scenario, performance improved, the amount of fetched data was reduced, and developers had a better idea of how much data actually should be retrieved for a given page, which also saved resources.
Listing 16. Consolidating queries
-- Two individual queries that require two trips to the data server on the page
SELECT FIN_PROJECT_MANAGER AS PM, PROJNAME FROM PROJECT WHERE ID = $id;
SELECT EMAIL, FIRSTNAME, LASTNAME FROM USER WHERE ID = $pm;

-- Single consolidated query that retrieves the same information in one trip
SELECT U.EMAIL, U.FIRSTNAME, U.LASTNAME
FROM USER U, PROJECT P
WHERE U.ID = P.FIN_PROJECT_MANAGER 
AND PROJECT.ID = $id

Why not migrate the Zend Framework?

The codebase for the example scenario predates most frameworks for PHP. Reusable components from PEAR and special purpose libraries such as database abstraction layers were starting to gain traction when we began writing PTT in 2002. Full stack frameworks like CakePHP and the Zend Framework came onto the scene years later. Adopting the Zend Framework is a desirable future goal, but updating the stable code to replace the database and start to lay the foundations for that were the primary scope of this code migration.

Make strides toward a more maintainable MVC architecture
In a well designed three-tiered web application built using the model-view-controller (MVC) architecture pattern, the controller handles a web request from a user and then invokes a command on the model to prepare a result. This result object is transferred to the view (often represented by a simple HTML template) to display the response data on the page. The original application did not use the MVC pattern. Instead, the application embedded queries directly into the templates, effectively intermingling the three responsibilities into one page.

By adopting an MVC best practice of grouping the queries, such as getting all user account information and all projects for that user in a single call to the model, the view page need only display the information, rather than incrementally fetching more data. This approach moves the application from being data-centric towards business object-centric, and therefore can help better organize your code by segmenting the data access logic from the layout and design markup.

Better distribute computational workload
Because the example scenario already had a single web server and a single database server, the web server mostly dealt with high volumes of HTTP traffic. The web server performed server-side logic through the PHP module. To improve performance at the HTTP server by reducing the CPU and memory consumption that the web server spent in mod_php, some business logic was offloaded to the data server. This often had the additional benefit of reducing the amount of network traffic (because it reduced traffic between the web and data servers), and therefore increased throughput by processing data at the same level at which it was accessed.
Improve data integrity by grouping queries on the server
The final and most important benefit for transactional integrity is to group logic into atomic, transactional units on the data server using stored procedures, triggers, and user-defined functions. You will find that the creation of tasks in an application like PTT is more reliable when the following happens:
  • All the data is transferred as a whole to the database
  • The multiple INSERT statements are performed together as a unit
  • A single success or failure error is returned to the user
Without a database that supported transactions, one or all of these database modifications might have failed in the example scenario, thereby leaving the system in an unpredictable state that required user or administrator intervention to resolve data inconsistency, such as a partially created task without an owner or other important linkage. Listing 17 shows an example of two related updates occurring as an atomic unit.
Listing 17. Group updates into a stored procedure
CREATE PROCEDURE BILLING_TYPE_UPDATE (IN p_date DATE)
BEGIN
    -- Update for project's billing type
    FOR row AS 
       SELECT * 
       FROM proj_billingtype_snapshot
       WHERE end_date IS NULL
       AND start_date = p_date
       
	    DO
	       -- Execute two updates in one transaction
	       t1: begin atomic   -- Transaction begins
	      
	       -- First update
	       UPDATE fin_attributes
	       SET proj_type = row.billing_type
	       WHERE project_id = row.proj_id;
	        
	       -- Second update
	       UPDATE fin_attributes_archive
	       SET proj_type = row.billing_type
	       WHERE project_id = row.proj_id
	       AND year = YEAR(NOW());
	          
	       end t1;            -- Transaction ends
    END FOR;
END

All DB2 versions support transactions, but most MySQL storage engines, including the default MyISAM type, don't support them. The example application was built using the default MyISAM table type, and the developers tried to enforce data integrity at the code level. Therefore it was a welcome transition to DB2, because it was possible to delegate transaction management to the database, which is something DB2 does very well. It also reduced the overall size and complexity of the code and the various moving parts that needed to work together on any given PHP page in the application. See section 5.2.8 Transactions and isolation level in the Redbook PHP Applications for IBM Data Servers (see Resources) to learn more about concurrency in DB2.


Step 3. Begin initial business use-case testing with stakeholders

By now, you should have a functional system that can be tested as a whole, or at least as a set of related functions that comprise a specific component. For example, creating a new task that is associated to a parent project is a fairly self-contained use case in the PTT system. Sending a piece of content, such as a product title and description, to a translation service is another use case that can be tested independently. This step of the conversion involves the following substeps:

Compile or create user-acceptance tests

In this step, start to engage interested stakeholders to execute a series of use cases that they are used to performing in the old system. You want to ensure that the same series of tests work as expected on the new system.

The example scenario used a fairly standard format for user-acceptance testing over the years. Therefore, there was an archive of tests to validate new features and to avoid regression problems with existing code. The test portfolio was expanded to accommodate the critical functions core to PTT's functionality. These tests generally encapsulate a set of related steps that achieve a business goal, and they resemble a use-case diagram. The difference from a static use-case model is that the tests are normally stored in a spreadsheet format along with screenshots and tables with updatable rows where a tester can enter whether a particular series of steps or individual steps passed or failed. Table 1 shows an example user-acceptance test case format to verify that task creation worked as expected.

Table 1. Sample user-acceptance test
TestExpected resultPassed?
1. Create a new task1.1 Log into the system.Welcome page is shown.Yes
1.2. Open new task page.New task form loads.Yes
1.3 Fill out form and save.Success message and unique task ID link shown.Yes
2. Approve task2.1 Log into the system.Welcome page is shown.Yes
2.2. Locate the list of tasks.The list of tasks loads.Yes
2.3 Click the approve button next to task.Success message notification shown and email sent. No

Keep in mind that it's critically important that your testers invest the proper time to execute user-acceptance tests and note any issues. If they simply skim through the tests, they may be overlooking errors. For the same reason, it is important to keep records of who tested at what time and what their reported results were in order to enforce accountability.

Unit test and fix problems

This step involves the systematic resolution of problems reported by user-acceptance testers. When the user-acceptance tests are received from the testers, problems that are reported are assigned to individual developers to validate and fix. For example, Test 2.3 in Table 1 would be assigned as a new defect.

As you iteratively migrate the system, developers should compare the migrated function in DB2 against the original MySQL source system to ensure that specific units of work performed as expected. Along with user-acceptance tests, it might be of value to automate this process using a unit-test framework such as PHPUnit. Because the example codebase was not well divided into MVC code that lends itself to easy programmatic testing, automated unit test or continuous integration system was not used.

Capture an image to use as a baseline

After the bulk of the use cases are complete, the final step before starting a new test iteration is to capture the current state of the system as a backup and a baseline to refer back to for functional and performance verification. Again, as iterative changes meet stable milestones, it's important to capture backups. This can be in the form of traditional file and SQL backups, or it can be in the form of complete operating system images. It is also important to keep these save points in case of an error down the road, or to use as a comparative baseline.


Step 4. Address bottlenecks and confirm against functional baseline

Once you have a complete set of user-acceptance tests that have been signed off by your stakeholders, it's time to improve the performance of the system. This step involves completing the following steps:

Confirm the baseline UAT tests

For this preparatory step, it's important to reiterate that you should have a good view of the confirmed functionality of the system so that you know whether any improvements to non-functional requirements, such as performance improvements, don't negatively impact or come at a tradeoff to the agreed functional requirements. This step should happen after functional verification for two reasons. First, it ensures that you have a baseline of known-good, signed-off functionality before making any non-functional changes. Second, it helps you avoid the pitfalls of premature optimization in your migration. You may know that a certain query or approach is faster in one case, but you should not blindly apply a change across the entire codebase, particularly if there is no indication that there is a performance problem.

For the example, a history was kept of the user-acceptance tests to confirm whether something had been tested and passed earlier to use to verify regression problems if they arose later.

Use tools to fix performance issues in DB2

The perils of premature optimization

It's likely you've heard of Donald Knuth's observation in 1974 that "premature optimization is the root of all evil." An example of the pitfalls of this comes from the experience of one of this article series' authors (we're not naming the guilty person!). When he learned that single-quoted strings perform better than double-quoted strings (due to the fact that interpolated variables are not replaced when enclosed in single quotes), he decided it might be good to replace all double quotes with single quotes. He had no reason to think the application was experiencing performance issues, but he executed what he thought was a pretty careful search and replace. Unfortunately, he missed one or two interpolated strings and introduced a very subtle bug that caused data integrity problems down the road. However, we have since forgiven him.

Your testing at this point should focus primarily on what DB2 can fix for you automatically, because DB2 is the largest change you introduced. As you change code, unit test, and perform UAT, you should identify bottlenecks and implement changes as needed. A key philosophy to follow is to change only the things that are known to be problematic through unit test and UAT testing rather than making assumptions about potential sources of performance problems. After this is done, you can then document what was changed and record it for future application against later problems.

There is a delicate balance here between premature optimization and applying fixes across the board based on a list of best practices. This will be an iterative process throughout the lifetime of the application. See the sidebar about the perils of premature optimization.

One technique that you may find useful is to implement a function, as was done for the example PTT application, that monitors scripts that run for longer than 60 seconds and send an error report. An example of this function is described in Part 4 of this article series.

Based on the information available in that report, you can discover whether the problem is a result of slow PHP execution or a slow query. For PHP problems, a debugger such as the one available in Zend Studio or the Eclipse PDT can help you locate the problem. For query problems, you can use IBM Data Studio to locate the problem and propose a fix. Figure 2 shows an example access plan graph for a given query.

Figure 2. Tuning a query in IBM Data Studio
Flow chart showing departments flowing to fetches and scans, then to join and to return

Locate and address bottlenecks in the operating system

If you've eliminated DB2 and PHP as the source of your application performance problems, you should iteratively locate additional bottlenecks by using the tried-and-true best practices described by John Coggeshall in "Zend Enterprise PHP Patterns" (see Resources). In short, you can start to look at three general sources of application bottlenecks in order to see if you can address any performance problems through a simple hardware upgrade or by tuning the resources allotted to a virtual machine:

  • CPU
  • Memory
  • Disk

While many of the tools used to determine bottlenecks in the Zend book apply to Linux operating systems, many of the techniques can be achieved with Windows-specific tools as you are developing or after you deploy your updated application to a test or staging server and again after deployment.


Step 5. Evaluate the code migration baseline

After one complete iteration of Steps 1-4, which in turn involved several passes themselves, you should have a functional database system on a Windows workstation and several notes on what you changed and what was problematic. If you used virtual machines, you can capture an image of the Windows system as a snapshot, both to archive as a functioning save point and to use as a baseline to compare future performance changes. Another option, of course, is to use a virtual image in the IBM or Amazon Cloud and use that in the same way. You might want to experiment with several different code updates to see what works best for your environment.

Once you are satisfied with the system on the Windows workstation used in the Steps 1-4, you can tag your converted code as a release in your version control system and prepare the infrastructure for the final steps before deployment as described in Part 4 of this article series.


Conclusion

The goal of this article series is to provide you with an understanding of what is generally needed to migrate a PHP application from MySQL to DB2, what resources are available to help you, and an example of a successful migration.

In this third part of the series, you:

  • Learned about the source PHP code you converted.
  • Learned how to update the application for DB2.
  • Learned how to test and tune the code after you converted it.

In Part 4, you'll learn how to deploy the migrated application and handle ongoing support.


Acknowledgments

The authors thank Leons Petrazickis and Ambrish Bhargava for their review and comments on this article.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Open source
ArticleID=755093
ArticleTitle=Moving a PHP application to DB2 from MySQL, Part 3: Convert your PHP code
publish-date=09012011