PHP does not support threading. Despite this, and in contrast to what most PHP developers with whom I've spoken believe, PHP applications can multitask. Let's start with as clear a picture as possible of what "multitasking" and "threading" mean for PHP programming.
First is to set aside several cases that are tangential to the main theme. PHP has a complex relationship with multitasking or concurrency. At a high level, PHP is constantly involved with multitasking: standard installations of server-side PHP — as an Apache module, for instance — are used in a multitasking way. That is, several clients — Web browsers — can simultaneously ask for the same PHP-interpreted page, and the Web server returns them all, more or less simultaneously.
One Web page doesn't block delivery of another, although they might interfere with each other slightly for such constrained resources as server memory or network bandwidth. In this way, a systemwide requirement for concurrency might well admit a PHP-based solution. In implementation terms, PHP lets its governing Web server take responsibility for the concurrency.
Client-side concurrency under the rubric of Ajax has also become a focus of developers in the past few years. While the meaning of Ajax has become a bit muddy, one aspect of it is that a browser display can simultaneously perform a calculation and remain responsive to such user actions as selection of a menu item. This is indeed a kind of multitasking. PHP-coded Ajax does this — but without any specific PHP involvement; Ajax frameworks for other languages operate just the same way.
A third instance of concurrency that only superficially involves PHP is PHP/TK. PHP/TK is an extension to PHP that provides portable graphical user interface (GUI) bindings to core PHP. PHP/TK allows for construction of desktop GUI applications coded in PHP. Its event-based aspects model a form of concurrency that's easy to learn and less error-prone than threading. Again, the concurrency is "inherited" from a complementary technology, rather than a fundamental feature of PHP.
There have been a few experiments to add threading support to PHP itself. To the best of my knowledge, none have been successful. However, the event-oriented achievements of Ajax frameworks and PHP/TK suggest that events might better express concurrency for PHP than threads do. PHP V5 proves that's so.
With standard PHP V4 and lower, all the work of a PHP application had to be done sequentially. If your program needed to retrieve the price of merchandise at two commercial sites, for instance, it requested the first price, waited until the response arrived, requested the second price, then waited again.
What if your programs could ask for several tasks to be done simultaneously? Your program as a whole would finish in a fraction of the time sequential processing fills up.
The new stream_select function, along with a few of its
friends, make this possible. Consider the following example.
Listing 1. Request several HTTP pages simultaneously
<?php
echo "Program starts at ". date('h:i:s') . ".\n";
$timeout=10;
$result=array();
$sockets=array();
$convenient_read_block=8192;
/* Issue all requests simultaneously; there's no blocking. */
$delay=15;
$id=0;
while ($delay > 0) {
$s=stream_socket_client("phaseit.net:80", $errno,
$errstr, $timeout,
STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($s) {
$sockets[$id++]=$s;
$http_message="GET /demonstration/delay?delay=" .
$delay . " HTTP/1.0\r\nHost: phaseit.net\r\n\r\n";
fwrite($s, $http_message);
} else {
echo "Stream " . $id . " failed to open correctly.";
}
$delay -= 3;
}
while (count($sockets)) {
$read=$sockets;
stream_select($read, $w=null, $e=null, $timeout);
if (count($read)) {
/* stream_select generally shuffles $read, so we need to
compute from which socket(s) we're reading. */
foreach ($read as $r) {
$id=array_search($r, $sockets);
$data=fread($r, $convenient_read_block);
/* A socket is readable either because it has
data to read, OR because it's at EOF. */
if (strlen($data) == 0) {
echo "Stream " . $id . " closes at " . date('h:i:s') . ".\n";
fclose($r);
unset($sockets[$id]);
} else {
$result[$id] .= $data;
}
}
} else {
/* A time-out means that *all* streams have failed
to receive a response. */
echo "Time-out!\n";
break;
}
}
?>
|
If you run this, you'll see output like what's shown below.
Listing 2. Typical output from the program of Listing 1
Program starts at 02:38:50.
Stream 4 closes at 02:38:53.
Stream 3 closes at 02:38:56.
Stream 2 closes at 02:38:59.
Stream 1 closes at 02:39:02.
Stream 0 closes at 02:39:05.
|
It's important to understand what's happening here. At a high level, this first program makes several HTTP requests and receives the pages the Web server sends it. While a production application would likely address several Web servers — perhaps google.com, yahoo.com, ask.com, etc. — this example sends all its requests to our corporate server at Phaseit.net, simply to reduce complexity.
The Web pages requested return results after a variable delay, shown below. If the program made its requests sequentially, it would take about 15+12+9+6+3 (45) seconds to finish. As Listing 2 shows, it actually finishes in 15 seconds. Tripling performance is great.
It's PHP V5's new stream_select function that makes this
possible. The requests are initiated in a conventional way by opening several stream_socket_clients and writing a GET
to each of them that corresponds to http://phaseit.net/demonstration/delay?delay=$DELAY. If you request
this URL yourself from a browser, after a few seconds, you'll see:
Starting at Thu Apr 12 15:05:01 UTC 2007.
Stopping at Thu Apr 12 15:05:05 UTC 2007.
4 second delay.
|
The delay server is implemented as CGI, as shown below.
Listing 3. Delay server implementation
#!/bin/sh
echo "Content-type: text/html
<HTML> <HEAD></HEAD> <BODY>"
echo "Starting at `date`."
RR=`echo $REQUEST_URI | sed -e 's/.*?//'`
DELAY=`echo $RR | sed -e 's/delay=//'`
sleep $DELAY
echo "<br>Stopping at `date`."
echo "<br>$DELAY second delay.</body></html>"
|
Although Listing 3's particular implementation is specific to UNIX®, almost all of this article applies equally well to Windows® (especially after Windows 98) or UNIX installations of PHP. Listing 1, in particular, can be hosted on either operating system. For this purpose, both Linux® and Mac OS X are UNIX variations, and all code here works for either.
Requests to the delay server are issued in the following order.
Listing 4. Sequence of process launches
delay=15
delay=12
delay= 9
delay= 6
delay= 3
|
The effect of stream_select is to receive results
as quickly as possible. In this case, it's in the order opposite of the one in which
they were issued. After 3 seconds, the first page is ready to read. This part of the
program is also conventional PHP — in this case, with fread. Just as in other PHP program, the reading could equally well
be done with fgets.
Processing continues in this same way. The program blocks at stream_select until data is ready. What's crucial is that it begins
to read whenever any connection has data, in any order. This is how the program
multitasks or concurrently processes results from several requests.
Note that this represents no burden on the host CPU. It's not unusual to run across
networking programs that fread in a while in such a way that CPU usage zooms to 100 percent. That's not
the case here because stream_select has the desirable
properties that it responds immediately, whenever any reading is possible, but it
represents a negligible CPU load while waiting between reads.
What you should know about stream_select()
Event-based programming like this is not elementary. While Listing 1 is reduced to its
essentials, any coding that involves callbacks or coordination, as is necessarily the
case for a multitasking application, will be less familiar than a simple procedural
sequence. In this case, most of the challenge centers on the $read array. Notice that it's a reference; stream_select returns crucial information by altering the content of
$read. Just as pointers have the reputation of being C's
great stumbling block, references seem to be a part of PHP that present the most difficulty to programmers.
You can use this technique to request requests from any number of external Web sites, confident that your program will receive each result as soon as possible, without waiting on other requests. In fact, the same technique correctly handles any TCP/IP connection, not just ones to Web port 80, so you can, in principle, manage LDAP retrievals, SMTP transmissions, SOAP requests, etc.
But that's not all. PHP V5 manages a variety of connections as "streams," not just
simple sockets. PHP's Client URL library (CURL) supports HTTPS certificates, FTP uploading, cookies, and
much more. (CURL allows PHP applications to use a variety of
protocols to connect to servers.) Because CURL provides a stream interface,
connectivity is transparent from a program's perspective. The next section shows how
stream_select even multiplexes local computations.
Several cautions also accompany stream_select. It's
under-documented in that even recent PHP books don't cover it. Several code examples
available on the Web simply don't work, or are confusing. The second and third
arguments to stream_select, which manage write and
exception channels corresponding to the read ones of Listing 1, should almost always be null. With few
exceptions, it's a mistake to select on writeable or exceptional channels. Unless
you're experienced, stick to readable selects.
Also, there apparently are errors in stream_select at least
as late as PHP V5.1.2. Most crucially, the function's return value can't be trusted.
While I haven't yet debugged the implementation, my experience is that it's safe to
test count($read) as in Listing 1, but not the return
value of stream_select itself, despite the official documentation.
The example and most of the discussion above have focused on how to manage several remote resources simultaneously and receive the results as they arrive, rather than waiting to process each one in the order of the original request. This certainly is an important use of PHP concurrency. Practical applications occasionally can be accelerated by a factor of 10 or more.
What if a slowdown is closer to home? Is there a way to speed up PHP results limited by local processing? There are several. If anything, these are even less well-known than the socket-oriented approach of Listing 1. There are several reasons for this, including:
- Most PHP pages are fast enough — Better performance would be an advantage, but not enough of one to merit investment in new code.
- PHP's use in Web pages can render partial accelerations inconsequential — Reordering a calculation so intermediate results become available more quickly doesn't matter when the only criterion of merit is how long it takes to deliver a Web page as a whole.
- Few local bottlenecks are under PHP's control — Users might complain that it takes 8 seconds to pull up details from an account record, but that's likely to be a constraint of database processing or some other resource external to PHP. Even if we reduce PHP processing to zero, it'll still take more than 7 seconds just for the lookup.
- Even fewer constraints are parallelizable — Suppose a particular page computes a suggested trade price for a specific listed common stock, and the calculation is sufficiently complicated to require many seconds. The calculation might be intrinsically sequential. There's no apparent way to partition it as "teamwork."
- Few PHP programmers understand PHP's potential for concurrency. Among the few with performance requirements that admit parallelization, most I've met simply recite that PHP "doesn't do threads" and resign themselves to their existing computational model.
Sometimes we can do better, though. Suppose a PHP page needs to calculate two stock prices, perhaps to compare them, and the underlying host happens to be a multiprocessor. In such a case, we might nearly double performance by assigning the two distinct and time-consuming calculations to different processors.
In the universe of all PHP calculations, such instances are rare. However, because I've found it accurately documented nowhere else, I want to include here a model for such speed-ups.
Listing 5. Delay server implementation
<?php
echo "Program starts at ". date('h:i:s') . ".\n";
$timeout=10;
$streams=array();
$handles=array();
/* First launch a program with a delay of three seconds, then
one which returns after only one second. */
$delay=3;
for ($id=0; $id <= 1; $id++) {
$error_log="/tmp/error" . $id . ".txt"
$descriptorspec=array(
0 => array("pipe", "r"),
1 => array("pipe", "w"),
2 => array("file", $error_log, "w")
);
$cmd='sleep ' . $delay . '; echo "Finished with delay of ' .
$delay . '".';
$handles[$id]=proc_open($cmd, $descriptorspec, $pipes);
$streams[$id]=$pipes[1];
$all_pipes[$id]=$pipes;
$delay -= 2;
}
while (count($streams)) {
$read=$streams;
stream_select($read, $w=null, $e=null, $timeout);
foreach ($read as $r) {
$id=array_search($r, $streams);
echo stream_get_contents($all_pipes[$id][1]);
if (feof($r)) {
fclose($all_pipes[$id][0]);
fclose($all_pipes[$id][1]);
$return_value=proc_close($handles[$id]);
unset($streams[$id]);
}
}
}
?>
|
This program produces output like this:
Program starts at 10:28:41.
Finished with delay of 1.
Finished with delay of 3.
|
The point here is that PHP launched two independent subprocesses, retrieved the output of the first one to finish, then the output of the second one, even though the latter started earlier. If the host is a multiprocessor machine, and the operating system is correctly configured, the operating system itself takes responsibility for assigning the different subprograms to different processors. This is one way to use PHP to advantage on a multiprocessing host.
PHP multitasks. PHP doesn't support threading in the way other languages like the Java™ programming language or C++ do, but the examples above show that PHP has more potential for speed-ups than many realize.
Learn
-
"A PHP V5
migration guide" tells what all PHP programmers should know about the latest major release.
-
PHP partially supports "Process
Control Functions" that were, early in PHP V4's history, a preferred way to program
concurrency.
-
With PHP V5,
stream_selectand related functions supplant PCNTL for the concurrency described in this article. -
curl_multi_selectis an undocumented entry point of PHP V5 that supports the same style ofselectprogramming this article presents and provides full CURL functionality. -
Celebrated computer scientist John
Ousterhout gave an Invited Talk in 1996 on Why Threads Are A Bad Idea (for most
purposes).
-
Wez Furlong implemented much of
PHP's streams as used in this article.
-
"PHP/TK is a native extension for the PHP programming language that greatly simplifies
writing client-side cross-platform GUI applications," according to the PHP/TK Web site.
-
This is the best-maintained list of Ajax frameworks for PHP available.
-
PHP.net is the central resource for PHP developers.
-
Check out the "Recommended PHP reading list."
-
Browse all the PHP content on developerWorks.
-
Expand your PHP skills by checking out IBM developerWorks' PHP project resources.
-
To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Using a database with PHP? Check out the Zend Core for IBM, a seamless, out-of-the-box, easy-to-install PHP development and production environment that supports IBM DB2 9.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
-
Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
-
Watch and learn about IBM and open source technologies and product functions with the
no-cost developerWorks On demand demos.
Get products and technologies
-
Innovate your next open source development project with IBM trial software, available for download or on DVD.
-
Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
-
Participate in developerWorks blogs and get involved in the developerWorks community.
-
Participate in the developerWorks PHP Forum: Developing PHP applications with IBM Information Management products (DB2, IDS).
Comments (Undergoing maintenance)






