Develop multitasking applications with PHP V5
V5 may not be threaded, but you can create applications that exploit in-process multitasking
PHP does not support threading. Despite this, and in contrast to what most PHP developers with whom I've spoken believe, PHP applications can multitask. Let's start with as clear a picture as possible of what "multitasking" and "threading" mean for PHP programming.
Varieties of concurrency
First is to set aside several cases that are tangential to the main theme. PHP has a complex relationship with multitasking or concurrency. At a high level, PHP is constantly involved with multitasking: standard installations of server-side PHP — as an Apache module, for instance — are used in a multitasking way. That is, several clients — Web browsers — can simultaneously ask for the same PHP-interpreted page, and the Web server returns them all, more or less simultaneously.
One Web page doesn't block delivery of another, although they might interfere with each other slightly for such constrained resources as server memory or network bandwidth. In this way, a systemwide requirement for concurrency might well admit a PHP-based solution. In implementation terms, PHP lets its governing Web server take responsibility for the concurrency.
Client-side concurrency under the rubric of Ajax has also become a focus of developers in the past few years. While the meaning of Ajax has become a bit muddy, one aspect of it is that a browser display can simultaneously perform a calculation and remain responsive to such user actions as selection of a menu item. This is indeed a kind of multitasking. PHP-coded Ajax does this — but without any specific PHP involvement; Ajax frameworks for other languages operate just the same way.
A third instance of concurrency that only superficially involves PHP is PHP/TK. PHP/TK is an extension to PHP that provides portable graphical user interface (GUI) bindings to core PHP. PHP/TK allows for construction of desktop GUI applications coded in PHP. Its event-based aspects model a form of concurrency that's easy to learn and less error-prone than threading. Again, the concurrency is "inherited" from a complementary technology, rather than a fundamental feature of PHP.
There have been a few experiments to add threading support to PHP itself. To the best of my knowledge, none have been successful. However, the event-oriented achievements of Ajax frameworks and PHP/TK suggest that events might better express concurrency for PHP than threads do. PHP V5 proves that's so.
PHP V5 gives
stream_select()
With standard PHP V4 and lower, all the work of a PHP application had to be done sequentially. If your program needed to retrieve the price of merchandise at two commercial sites, for instance, it requested the first price, waited until the response arrived, requested the second price, then waited again.
What if your programs could ask for several tasks to be done simultaneously? Your program as a whole would finish in a fraction of the time sequential processing fills up.
A first example
The new stream_select
function, along with a few of its
friends, make this possible. Consider the following example.
Listing 1. Request several HTTP pages simultaneously
<?php echo "Program starts at ". date('h:i:s') . ".\n"; $timeout=10; $result=array(); $sockets=array(); $convenient_read_block=8192; /* Issue all requests simultaneously; there's no blocking. */ $delay=15; $id=0; while ($delay > 0) { $s=stream_socket_client("phaseit.net:80", $errno, $errstr, $timeout, STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT); if ($s) { $sockets[$id++]=$s; $http_message="GET /demonstration/delay?delay=" . $delay . " HTTP/1.0\r\nHost: phaseit.net\r\n\r\n"; fwrite($s, $http_message); } else { echo "Stream " . $id . " failed to open correctly."; } $delay -= 3; } while (count($sockets)) { $read=$sockets; stream_select($read, $w=null, $e=null, $timeout); if (count($read)) { /* stream_select generally shuffles $read, so we need to compute from which socket(s) we're reading. */ foreach ($read as $r) { $id=array_search($r, $sockets); $data=fread($r, $convenient_read_block); /* A socket is readable either because it has data to read, OR because it's at EOF. */ if (strlen($data) == 0) { echo "Stream " . $id . " closes at " . date('h:i:s') . ".\n"; fclose($r); unset($sockets[$id]); } else { $result[$id] .= $data; } } } else { /* A time-out means that *all* streams have failed to receive a response. */ echo "Time-out!\n"; break; } } ?>
If you run this, you'll see output like what's shown below.
Listing 2. Typical output from the program of Listing 1
Program starts at 02:38:50. Stream 4 closes at 02:38:53. Stream 3 closes at 02:38:56. Stream 2 closes at 02:38:59. Stream 1 closes at 02:39:02. Stream 0 closes at 02:39:05.
It's important to understand what's happening here. At a high level, this first program makes several HTTP requests and receives the pages the Web server sends it. While a production application would likely address several Web servers — perhaps google.com, yahoo.com, ask.com, etc. — this example sends all its requests to our corporate server at Phaseit.net, simply to reduce complexity.
The Web pages requested return results after a variable delay, shown below. If the program made its requests sequentially, it would take about 15+12+9+6+3 (45) seconds to finish. As Listing 2 shows, it actually finishes in 15 seconds. Tripling performance is great.
It's PHP V5's new stream_select
function that makes this
possible. The requests are initiated in a conventional way by opening
several stream_socket_client
s and writing a GET
to each of them that corresponds to
http://phaseit.net/demonstration/delay?delay=$DELAY
. If you
request this URL yourself from a browser, after a few seconds, you'll
see:
Starting at Thu Apr 12 15:05:01 UTC 2007. Stopping at Thu Apr 12 15:05:05 UTC 2007. 4 second delay.
The delay server is implemented as CGI, as shown below.
Listing 3. Delay server implementation
#!/bin/sh echo "Content-type: text/html <HTML> <HEAD></HEAD> <BODY>" echo "Starting at `date`." RR=`echo $REQUEST_URI | sed -e 's/.*?//'` DELAY=`echo $RR | sed -e 's/delay=//'` sleep $DELAY echo "<br>Stopping at `date`." echo "<br>$DELAY second delay.</body></html>"
Although Listing 3's particular implementation is specific to UNIX®, almost all of this article applies equally well to Windows® (especially after Windows 98) or UNIX installations of PHP. Listing 1, in particular, can be hosted on either operating system. For this purpose, both Linux® and Mac OS X are UNIX variations, and all code here works for either.
Requests to the delay server are issued in the following order.
Listing 4. Sequence of process launches
delay=15 delay=12 delay= 9 delay= 6 delay= 3
The effect of stream_select
is to receive results as quickly
as possible. In this case, it's in the order opposite of the one in which
they were issued. After 3 seconds, the first page is ready to read. This
part of the program is also conventional PHP — in this case, with
fread
. Just as in other PHP program, the reading could
equally well be done with fgets
.
Processing continues in this same way. The program blocks at
stream_select
until data is ready. What's crucial is that it
begins to read whenever any connection has data, in any order.
This is how the program multitasks or concurrently processes results from
several requests.
Note that this represents no burden on the host CPU. It's not unusual to
run across networking programs that fread
in a
while
in such a way that CPU usage zooms to 100 percent.
That's not the case here because stream_select
has the
desirable properties that it responds immediately, whenever any reading is
possible, but it represents a negligible CPU load while waiting between
reads.
What you should know
about stream_select()
Event-based programming like this is not elementary. While Listing 1 is
reduced to its essentials, any coding that involves callbacks or
coordination, as is necessarily the case for a multitasking application,
will be less familiar than a simple procedural sequence. In this case,
most of the challenge centers on the $read
array. Notice that
it's a reference; stream_select
returns crucial
information by altering the content of $read
. Just as
pointers have the reputation of being C's great stumbling block,
references seem to be a part of PHP that present the most difficulty to
programmers.
You can use this technique to request requests from any number of external Web sites, confident that your program will receive each result as soon as possible, without waiting on other requests. In fact, the same technique correctly handles any TCP/IP connection, not just ones to Web port 80, so you can, in principle, manage LDAP retrievals, SMTP transmissions, SOAP requests, etc.
But that's not all. PHP V5 manages a variety of connections as "streams,"
not just simple sockets. PHP's Client URL library (CURL) supports HTTPS
certificates, FTP uploading, cookies, and much more. (CURL allows PHP
applications to use a variety of protocols to connect to servers.) Because
CURL provides a stream interface, connectivity is transparent from a
program's perspective. The next section shows how
stream_select
even multiplexes local computations.
Several cautions also accompany stream_select
. It's
under-documented in that even recent PHP books don't cover it. Several
code examples available on the Web simply don't work, or are confusing.
The second and third arguments to stream_select
, which manage
write
and exception
channels corresponding to
the read
ones of Listing 1, should almost always be null.
With few exceptions, it's a mistake to select on writeable or exceptional
channels. Unless you're experienced, stick to readable selects.
Also, there apparently are errors in stream_select
at least as
late as PHP V5.1.2. Most crucially, the function's return value can't be
trusted. While I haven't yet debugged the implementation, my experience is
that it's safe to test count($read)
as in Listing 1, but
not the return value of stream_select
itself,
despite the official documentation.
Local PHP concurrency
The example and most of the discussion above have focused on how to manage several remote resources simultaneously and receive the results as they arrive, rather than waiting to process each one in the order of the original request. This certainly is an important use of PHP concurrency. Practical applications occasionally can be accelerated by a factor of 10 or more.
What if a slowdown is closer to home? Is there a way to speed up PHP results limited by local processing? There are several. If anything, these are even less well-known than the socket-oriented approach of Listing 1. There are several reasons for this, including:
- Most PHP pages are fast enough — Better performance would be an advantage, but not enough of one to merit investment in new code.
- PHP's use in Web pages can render partial accelerations inconsequential — Reordering a calculation so intermediate results become available more quickly doesn't matter when the only criterion of merit is how long it takes to deliver a Web page as a whole.
- Few local bottlenecks are under PHP's control — Users might complain that it takes 8 seconds to pull up details from an account record, but that's likely to be a constraint of database processing or some other resource external to PHP. Even if we reduce PHP processing to zero, it'll still take more than 7 seconds just for the lookup.
- Even fewer constraints are parallelizable — Suppose a particular page computes a suggested trade price for a specific listed common stock, and the calculation is sufficiently complicated to require many seconds. The calculation might be intrinsically sequential. There's no apparent way to partition it as "teamwork."
- Few PHP programmers understand PHP's potential for concurrency. Among the few with performance requirements that admit parallelization, most I've met simply recite that PHP "doesn't do threads" and resign themselves to their existing computational model.
Sometimes we can do better, though. Suppose a PHP page needs to calculate two stock prices, perhaps to compare them, and the underlying host happens to be a multiprocessor. In such a case, we might nearly double performance by assigning the two distinct and time-consuming calculations to different processors.
In the universe of all PHP calculations, such instances are rare. However, because I've found it accurately documented nowhere else, I want to include here a model for such speed-ups.
Listing 5. Delay server implementation
<?php echo "Program starts at ". date('h:i:s') . ".\n"; $timeout=10; $streams=array(); $handles=array(); /* First launch a program with a delay of three seconds, then one which returns after only one second. */ $delay=3; for ($id=0; $id <= 1; $id++) { $error_log="/tmp/error" . $id . ".txt" $descriptorspec=array( 0 => array("pipe", "r"), 1 => array("pipe", "w"), 2 => array("file", $error_log, "w") ); $cmd='sleep ' . $delay . '; echo "Finished with delay of ' . $delay . '".'; $handles[$id]=proc_open($cmd, $descriptorspec, $pipes); $streams[$id]=$pipes[1]; $all_pipes[$id]=$pipes; $delay -= 2; } while (count($streams)) { $read=$streams; stream_select($read, $w=null, $e=null, $timeout); foreach ($read as $r) { $id=array_search($r, $streams); echo stream_get_contents($all_pipes[$id][1]); if (feof($r)) { fclose($all_pipes[$id][0]); fclose($all_pipes[$id][1]); $return_value=proc_close($handles[$id]); unset($streams[$id]); } } } ?>
This program produces output like this:
Program starts at 10:28:41. Finished with delay of 1. Finished with delay of 3.
The point here is that PHP launched two independent subprocesses, retrieved the output of the first one to finish, then the output of the second one, even though the latter started earlier. If the host is a multiprocessor machine, and the operating system is correctly configured, the operating system itself takes responsibility for assigning the different subprograms to different processors. This is one way to use PHP to advantage on a multiprocessing host.
Summary
PHP multitasks. PHP doesn't support threading in the way other languages like the Java™ programming language or C++ do, but the examples above show that PHP has more potential for speed-ups than many realize.
Downloadable resources
Related topics
- PHP partially supports "Process Control Functions" that were, early in PHP V4's history, a preferred way to program concurrency.
- With PHP V5,
stream_select
and related functions supplant PCNTL for the concurrency described in this article. curl_multi_select
is an undocumented entry point of PHP V5 that supports the same style ofselect
programming this article presents and provides full CURL functionality.- Wez Furlong implemented much of PHP's streams as used in this article.
- "PHP/TK is a native extension for the PHP programming language that greatly simplifies writing client-side cross-platform GUI applications," according to the PHP/TK Web site.
- PHP.net is the central resource for PHP developers.
- Check out the "Recommended PHP reading list."
- Browse all the PHP content on developerWorks.
- Expand your PHP skills by checking out IBM developerWorks' PHP project resources.
- Using a database with PHP? Check out the Zend Core for IBM, a seamless, out-of-the-box, easy-to-install PHP development and production environment that supports IBM DB2 9.