Contents


Develop multitasking applications with PHP V5

V5 may not be threaded, but you can create applications that exploit in-process multitasking

Comments

PHP does not support threading. Despite this, and in contrast to what most PHP developers with whom I've spoken believe, PHP applications can multitask. Let's start with as clear a picture as possible of what "multitasking" and "threading" mean for PHP programming.

Varieties of concurrency

First is to set aside several cases that are tangential to the main theme. PHP has a complex relationship with multitasking or concurrency. At a high level, PHP is constantly involved with multitasking: standard installations of server-side PHP — as an Apache module, for instance — are used in a multitasking way. That is, several clients — Web browsers — can simultaneously ask for the same PHP-interpreted page, and the Web server returns them all, more or less simultaneously.

One Web page doesn't block delivery of another, although they might interfere with each other slightly for such constrained resources as server memory or network bandwidth. In this way, a systemwide requirement for concurrency might well admit a PHP-based solution. In implementation terms, PHP lets its governing Web server take responsibility for the concurrency.

Client-side concurrency under the rubric of Ajax has also become a focus of developers in the past few years. While the meaning of Ajax has become a bit muddy, one aspect of it is that a browser display can simultaneously perform a calculation and remain responsive to such user actions as selection of a menu item. This is indeed a kind of multitasking. PHP-coded Ajax does this — but without any specific PHP involvement; Ajax frameworks for other languages operate just the same way.

A third instance of concurrency that only superficially involves PHP is PHP/TK. PHP/TK is an extension to PHP that provides portable graphical user interface (GUI) bindings to core PHP. PHP/TK allows for construction of desktop GUI applications coded in PHP. Its event-based aspects model a form of concurrency that's easy to learn and less error-prone than threading. Again, the concurrency is "inherited" from a complementary technology, rather than a fundamental feature of PHP.

There have been a few experiments to add threading support to PHP itself. To the best of my knowledge, none have been successful. However, the event-oriented achievements of Ajax frameworks and PHP/TK suggest that events might better express concurrency for PHP than threads do. PHP V5 proves that's so.

PHP V5 gives stream_select()

With standard PHP V4 and lower, all the work of a PHP application had to be done sequentially. If your program needed to retrieve the price of merchandise at two commercial sites, for instance, it requested the first price, waited until the response arrived, requested the second price, then waited again.

What if your programs could ask for several tasks to be done simultaneously? Your program as a whole would finish in a fraction of the time sequential processing fills up.

A first example

The new stream_select function, along with a few of its friends, make this possible. Consider the following example.

Listing 1. Request several HTTP pages simultaneously
       <?php
	echo "Program starts at ". date('h:i:s') . ".\n";

        $timeout=10; 
        $result=array(); 
        $sockets=array(); 
        $convenient_read_block=8192;
        
        /* Issue all requests simultaneously; there's no blocking. */
        $delay=15;
        $id=0;
        while ($delay > 0) {
            $s=stream_socket_client("phaseit.net:80", $errno,
                  $errstr, $timeout,
                  STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT); 
            if ($s) { 
                $sockets[$id++]=$s; 
                $http_message="GET /demonstration/delay?delay=" .
                    $delay . " HTTP/1.0\r\nHost: phaseit.net\r\n\r\n"; 
                fwrite($s, $http_message);
            } else { 
                echo "Stream " . $id . " failed to open correctly.";
            } 
            $delay -= 3;
        } 
        
        while (count($sockets)) { 
            $read=$sockets; 
            stream_select($read, $w=null, $e=null, $timeout); 
            if (count($read)) {
                /* stream_select generally shuffles $read, so we need to
                   compute from which socket(s) we're reading. */
                foreach ($read as $r) { 
                    $id=array_search($r, $sockets); 
                    $data=fread($r, $convenient_read_block); 
                    /* A socket is readable either because it has
                       data to read, OR because it's at EOF. */
                    if (strlen($data) == 0) { 
                        echo "Stream " . $id . " closes at " . date('h:i:s') . ".\n";
                        fclose($r); 
                        unset($sockets[$id]); 
                    } else { 
                        $result[$id] .= $data; 
                    } 
                } 
            } else { 
                /* A time-out means that *all* streams have failed
                   to receive a response. */
                echo "Time-out!\n";
                break;
            } 
        } 
       ?>

If you run this, you'll see output like what's shown below.

Listing 2. Typical output from the program of Listing 1
	 Program starts at 02:38:50.
         Stream 4 closes at 02:38:53.
	 Stream 3 closes at 02:38:56.
	 Stream 2 closes at 02:38:59.
	 Stream 1 closes at 02:39:02.
	 Stream 0 closes at 02:39:05.

It's important to understand what's happening here. At a high level, this first program makes several HTTP requests and receives the pages the Web server sends it. While a production application would likely address several Web servers — perhaps google.com, yahoo.com, ask.com, etc. — this example sends all its requests to our corporate server at Phaseit.net, simply to reduce complexity.

The Web pages requested return results after a variable delay, shown below. If the program made its requests sequentially, it would take about 15+12+9+6+3 (45) seconds to finish. As Listing 2 shows, it actually finishes in 15 seconds. Tripling performance is great.

It's PHP V5's new stream_select function that makes this possible. The requests are initiated in a conventional way by opening several stream_socket_clients and writing a GET to each of them that corresponds to http://phaseit.net/demonstration/delay?delay=$DELAY. If you request this URL yourself from a browser, after a few seconds, you'll see:

	  Starting at Thu Apr 12 15:05:01 UTC 2007. 
	  Stopping at Thu Apr 12 15:05:05 UTC 2007. 
	  4 second delay.

The delay server is implemented as CGI, as shown below.

Listing 3. Delay server implementation
	  #!/bin/sh

	  echo "Content-type: text/html

	  <HTML> <HEAD></HEAD> <BODY>"

	  echo "Starting at `date`."
	  RR=`echo $REQUEST_URI | sed -e 's/.*?//'`
	  DELAY=`echo $RR | sed -e 's/delay=//'`
	  sleep $DELAY
	  echo "<br>Stopping at `date`."
	  echo "<br>$DELAY second delay.</body></html>"

Although Listing 3's particular implementation is specific to UNIX®, almost all of this article applies equally well to Windows® (especially after Windows 98) or UNIX installations of PHP. Listing 1, in particular, can be hosted on either operating system. For this purpose, both Linux® and Mac OS X are UNIX variations, and all code here works for either.

Requests to the delay server are issued in the following order.

Listing 4. Sequence of process launches
	delay=15
	delay=12
	delay= 9
	delay= 6
	delay= 3

The effect of stream_select is to receive results as quickly as possible. In this case, it's in the order opposite of the one in which they were issued. After 3 seconds, the first page is ready to read. This part of the program is also conventional PHP — in this case, with fread. Just as in other PHP program, the reading could equally well be done with fgets.

Processing continues in this same way. The program blocks at stream_select until data is ready. What's crucial is that it begins to read whenever any connection has data, in any order. This is how the program multitasks or concurrently processes results from several requests.

Note that this represents no burden on the host CPU. It's not unusual to run across networking programs that fread in a while in such a way that CPU usage zooms to 100 percent. That's not the case here because stream_select has the desirable properties that it responds immediately, whenever any reading is possible, but it represents a negligible CPU load while waiting between reads.

What you should know about stream_select()

Event-based programming like this is not elementary. While Listing 1 is reduced to its essentials, any coding that involves callbacks or coordination, as is necessarily the case for a multitasking application, will be less familiar than a simple procedural sequence. In this case, most of the challenge centers on the $read array. Notice that it's a reference; stream_select returns crucial information by altering the content of $read. Just as pointers have the reputation of being C's great stumbling block, references seem to be a part of PHP that present the most difficulty to programmers.

You can use this technique to request requests from any number of external Web sites, confident that your program will receive each result as soon as possible, without waiting on other requests. In fact, the same technique correctly handles any TCP/IP connection, not just ones to Web port 80, so you can, in principle, manage LDAP retrievals, SMTP transmissions, SOAP requests, etc.

But that's not all. PHP V5 manages a variety of connections as "streams," not just simple sockets. PHP's Client URL library (CURL) supports HTTPS certificates, FTP uploading, cookies, and much more. (CURL allows PHP applications to use a variety of protocols to connect to servers.) Because CURL provides a stream interface, connectivity is transparent from a program's perspective. The next section shows how stream_select even multiplexes local computations.

Several cautions also accompany stream_select. It's under-documented in that even recent PHP books don't cover it. Several code examples available on the Web simply don't work, or are confusing. The second and third arguments to stream_select, which manage write and exception channels corresponding to the read ones of Listing 1, should almost always be null. With few exceptions, it's a mistake to select on writeable or exceptional channels. Unless you're experienced, stick to readable selects.

Also, there apparently are errors in stream_select at least as late as PHP V5.1.2. Most crucially, the function's return value can't be trusted. While I haven't yet debugged the implementation, my experience is that it's safe to test count($read) as in Listing 1, but not the return value of stream_select itself, despite the official documentation.

Local PHP concurrency

The example and most of the discussion above have focused on how to manage several remote resources simultaneously and receive the results as they arrive, rather than waiting to process each one in the order of the original request. This certainly is an important use of PHP concurrency. Practical applications occasionally can be accelerated by a factor of 10 or more.

What if a slowdown is closer to home? Is there a way to speed up PHP results limited by local processing? There are several. If anything, these are even less well-known than the socket-oriented approach of Listing 1. There are several reasons for this, including:

  • Most PHP pages are fast enough — Better performance would be an advantage, but not enough of one to merit investment in new code.
  • PHP's use in Web pages can render partial accelerations inconsequential — Reordering a calculation so intermediate results become available more quickly doesn't matter when the only criterion of merit is how long it takes to deliver a Web page as a whole.
  • Few local bottlenecks are under PHP's control — Users might complain that it takes 8 seconds to pull up details from an account record, but that's likely to be a constraint of database processing or some other resource external to PHP. Even if we reduce PHP processing to zero, it'll still take more than 7 seconds just for the lookup.
  • Even fewer constraints are parallelizable — Suppose a particular page computes a suggested trade price for a specific listed common stock, and the calculation is sufficiently complicated to require many seconds. The calculation might be intrinsically sequential. There's no apparent way to partition it as "teamwork."
  • Few PHP programmers understand PHP's potential for concurrency. Among the few with performance requirements that admit parallelization, most I've met simply recite that PHP "doesn't do threads" and resign themselves to their existing computational model.

Sometimes we can do better, though. Suppose a PHP page needs to calculate two stock prices, perhaps to compare them, and the underlying host happens to be a multiprocessor. In such a case, we might nearly double performance by assigning the two distinct and time-consuming calculations to different processors.

In the universe of all PHP calculations, such instances are rare. However, because I've found it accurately documented nowhere else, I want to include here a model for such speed-ups.

Listing 5. Delay server implementation
          <?php
          echo "Program starts at ". date('h:i:s') . ".\n";
          
          $timeout=10; 
          $streams=array();
          $handles=array();
          
	  /* First launch a program with a delay of three seconds, then
	     one which returns after only one second. */
          $delay=3;
          for ($id=0; $id <= 1; $id++) {
	      $error_log="/tmp/error" . $id . ".txt"
              $descriptorspec=array(
                  0 => array("pipe", "r"),
                  1 => array("pipe", "w"),
                  2 => array("file", $error_log, "w")
              );
              $cmd='sleep ' . $delay . '; echo "Finished with delay of ' .
                      $delay . '".';
              $handles[$id]=proc_open($cmd, $descriptorspec, $pipes);
              $streams[$id]=$pipes[1];
              $all_pipes[$id]=$pipes;
              $delay -= 2;
          }
          
          while (count($streams)) { 
              $read=$streams; 
              stream_select($read, $w=null, $e=null, $timeout); 
              foreach ($read as $r) { 
                  $id=array_search($r, $streams); 
                  echo stream_get_contents($all_pipes[$id][1]);
                  if (feof($r)) {
                      fclose($all_pipes[$id][0]);
                      fclose($all_pipes[$id][1]);
                      $return_value=proc_close($handles[$id]);
                      unset($streams[$id]); 
                  }
              } 
          } 
         ?>

This program produces output like this:

	  Program starts at 10:28:41.
	  Finished with delay of 1.
	  Finished with delay of 3.

The point here is that PHP launched two independent subprocesses, retrieved the output of the first one to finish, then the output of the second one, even though the latter started earlier. If the host is a multiprocessor machine, and the operating system is correctly configured, the operating system itself takes responsibility for assigning the different subprograms to different processors. This is one way to use PHP to advantage on a multiprocessing host.

Summary

PHP multitasks. PHP doesn't support threading in the way other languages like the Java™ programming language or C++ do, but the examples above show that PHP has more potential for speed-ups than many realize.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=240625
ArticleTitle=Develop multitasking applications with PHP V5
publish-date=08072007