Develop multitasking applications with PHP V5

V5 may not be threaded, but you can create applications that exploit in-process multitasking

Many PHP developers believe that because standard PHP lacks threading capabilities, it's impossible for a practical PHP application to multitask. For example, if an application needs information from a different Web site, it has to stall until that remote retrieval is done. Not true! Find out about in-process PHP multitasking using stream_select and stream_socket_client.

Share:

Cameron Laird (claird@phaseit.net), Vice president, Phaseit Inc.

Author photo: Cameron LairdCameron Laird is a long-time developerWorks contributor and former columnist. He often writes about the open source projects that accelerate development of his employer's applications, focused on reliability and security.



07 August 2007

Also available in Russian Japanese

PHP does not support threading. Despite this, and in contrast to what most PHP developers with whom I've spoken believe, PHP applications can multitask. Let's start with as clear a picture as possible of what "multitasking" and "threading" mean for PHP programming.

Varieties of concurrency

First is to set aside several cases that are tangential to the main theme. PHP has a complex relationship with multitasking or concurrency. At a high level, PHP is constantly involved with multitasking: standard installations of server-side PHP — as an Apache module, for instance — are used in a multitasking way. That is, several clients — Web browsers — can simultaneously ask for the same PHP-interpreted page, and the Web server returns them all, more or less simultaneously.

One Web page doesn't block delivery of another, although they might interfere with each other slightly for such constrained resources as server memory or network bandwidth. In this way, a systemwide requirement for concurrency might well admit a PHP-based solution. In implementation terms, PHP lets its governing Web server take responsibility for the concurrency.

Client-side concurrency under the rubric of Ajax has also become a focus of developers in the past few years. While the meaning of Ajax has become a bit muddy, one aspect of it is that a browser display can simultaneously perform a calculation and remain responsive to such user actions as selection of a menu item. This is indeed a kind of multitasking. PHP-coded Ajax does this — but without any specific PHP involvement; Ajax frameworks for other languages operate just the same way.

A third instance of concurrency that only superficially involves PHP is PHP/TK. PHP/TK is an extension to PHP that provides portable graphical user interface (GUI) bindings to core PHP. PHP/TK allows for construction of desktop GUI applications coded in PHP. Its event-based aspects model a form of concurrency that's easy to learn and less error-prone than threading. Again, the concurrency is "inherited" from a complementary technology, rather than a fundamental feature of PHP.

There have been a few experiments to add threading support to PHP itself. To the best of my knowledge, none have been successful. However, the event-oriented achievements of Ajax frameworks and PHP/TK suggest that events might better express concurrency for PHP than threads do. PHP V5 proves that's so.


PHP V5 gives stream_select()

With standard PHP V4 and lower, all the work of a PHP application had to be done sequentially. If your program needed to retrieve the price of merchandise at two commercial sites, for instance, it requested the first price, waited until the response arrived, requested the second price, then waited again.

What if your programs could ask for several tasks to be done simultaneously? Your program as a whole would finish in a fraction of the time sequential processing fills up.

A first example

The new stream_select function, along with a few of its friends, make this possible. Consider the following example.

Listing 1. Request several HTTP pages simultaneously
       <?php
	echo "Program starts at ". date('h:i:s') . ".\n";

        $timeout=10; 
        $result=array(); 
        $sockets=array(); 
        $convenient_read_block=8192;
        
        /* Issue all requests simultaneously; there's no blocking. */
        $delay=15;
        $id=0;
        while ($delay > 0) {
            $s=stream_socket_client("phaseit.net:80", $errno,
                  $errstr, $timeout,
                  STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT); 
            if ($s) { 
                $sockets[$id++]=$s; 
                $http_message="GET /demonstration/delay?delay=" .
                    $delay . " HTTP/1.0\r\nHost: phaseit.net\r\n\r\n"; 
                fwrite($s, $http_message);
            } else { 
                echo "Stream " . $id . " failed to open correctly.";
            } 
            $delay -= 3;
        } 
        
        while (count($sockets)) { 
            $read=$sockets; 
            stream_select($read, $w=null, $e=null, $timeout); 
            if (count($read)) {
                /* stream_select generally shuffles $read, so we need to
                   compute from which socket(s) we're reading. */
                foreach ($read as $r) { 
                    $id=array_search($r, $sockets); 
                    $data=fread($r, $convenient_read_block); 
                    /* A socket is readable either because it has
                       data to read, OR because it's at EOF. */
                    if (strlen($data) == 0) { 
                        echo "Stream " . $id . " closes at " . date('h:i:s') . ".\n";
                        fclose($r); 
                        unset($sockets[$id]); 
                    } else { 
                        $result[$id] .= $data; 
                    } 
                } 
            } else { 
                /* A time-out means that *all* streams have failed
                   to receive a response. */
                echo "Time-out!\n";
                break;
            } 
        } 
       ?>

If you run this, you'll see output like what's shown below.

Listing 2. Typical output from the program of Listing 1
	 Program starts at 02:38:50.
         Stream 4 closes at 02:38:53.
	 Stream 3 closes at 02:38:56.
	 Stream 2 closes at 02:38:59.
	 Stream 1 closes at 02:39:02.
	 Stream 0 closes at 02:39:05.

It's important to understand what's happening here. At a high level, this first program makes several HTTP requests and receives the pages the Web server sends it. While a production application would likely address several Web servers — perhaps google.com, yahoo.com, ask.com, etc. — this example sends all its requests to our corporate server at Phaseit.net, simply to reduce complexity.

The Web pages requested return results after a variable delay, shown below. If the program made its requests sequentially, it would take about 15+12+9+6+3 (45) seconds to finish. As Listing 2 shows, it actually finishes in 15 seconds. Tripling performance is great.

It's PHP V5's new stream_select function that makes this possible. The requests are initiated in a conventional way by opening several stream_socket_clients and writing a GET to each of them that corresponds to http://phaseit.net/demonstration/delay?delay=$DELAY. If you request this URL yourself from a browser, after a few seconds, you'll see:

	  Starting at Thu Apr 12 15:05:01 UTC 2007. 
	  Stopping at Thu Apr 12 15:05:05 UTC 2007. 
	  4 second delay.

The delay server is implemented as CGI, as shown below.

Listing 3. Delay server implementation
	  #!/bin/sh

	  echo "Content-type: text/html

	  <HTML> <HEAD></HEAD> <BODY>"

	  echo "Starting at `date`."
	  RR=`echo $REQUEST_URI | sed -e 's/.*?//'`
	  DELAY=`echo $RR | sed -e 's/delay=//'`
	  sleep $DELAY
	  echo "<br>Stopping at `date`."
	  echo "<br>$DELAY second delay.</body></html>"

Although Listing 3's particular implementation is specific to UNIX®, almost all of this article applies equally well to Windows® (especially after Windows 98) or UNIX installations of PHP. Listing 1, in particular, can be hosted on either operating system. For this purpose, both Linux® and Mac OS X are UNIX variations, and all code here works for either.

Requests to the delay server are issued in the following order.

Listing 4. Sequence of process launches
	delay=15
	delay=12
	delay= 9
	delay= 6
	delay= 3

The effect of stream_select is to receive results as quickly as possible. In this case, it's in the order opposite of the one in which they were issued. After 3 seconds, the first page is ready to read. This part of the program is also conventional PHP — in this case, with fread. Just as in other PHP program, the reading could equally well be done with fgets.

Processing continues in this same way. The program blocks at stream_select until data is ready. What's crucial is that it begins to read whenever any connection has data, in any order. This is how the program multitasks or concurrently processes results from several requests.

Note that this represents no burden on the host CPU. It's not unusual to run across networking programs that fread in a while in such a way that CPU usage zooms to 100 percent. That's not the case here because stream_select has the desirable properties that it responds immediately, whenever any reading is possible, but it represents a negligible CPU load while waiting between reads.


What you should know about stream_select()

Event-based programming like this is not elementary. While Listing 1 is reduced to its essentials, any coding that involves callbacks or coordination, as is necessarily the case for a multitasking application, will be less familiar than a simple procedural sequence. In this case, most of the challenge centers on the $read array. Notice that it's a reference; stream_select returns crucial information by altering the content of $read. Just as pointers have the reputation of being C's great stumbling block, references seem to be a part of PHP that present the most difficulty to programmers.

You can use this technique to request requests from any number of external Web sites, confident that your program will receive each result as soon as possible, without waiting on other requests. In fact, the same technique correctly handles any TCP/IP connection, not just ones to Web port 80, so you can, in principle, manage LDAP retrievals, SMTP transmissions, SOAP requests, etc.

But that's not all. PHP V5 manages a variety of connections as "streams," not just simple sockets. PHP's Client URL library (CURL) supports HTTPS certificates, FTP uploading, cookies, and much more. (CURL allows PHP applications to use a variety of protocols to connect to servers.) Because CURL provides a stream interface, connectivity is transparent from a program's perspective. The next section shows how stream_select even multiplexes local computations.

Several cautions also accompany stream_select. It's under-documented in that even recent PHP books don't cover it. Several code examples available on the Web simply don't work, or are confusing. The second and third arguments to stream_select, which manage write and exception channels corresponding to the read ones of Listing 1, should almost always be null. With few exceptions, it's a mistake to select on writeable or exceptional channels. Unless you're experienced, stick to readable selects.

Also, there apparently are errors in stream_select at least as late as PHP V5.1.2. Most crucially, the function's return value can't be trusted. While I haven't yet debugged the implementation, my experience is that it's safe to test count($read) as in Listing 1, but not the return value of stream_select itself, despite the official documentation.


Local PHP concurrency

The example and most of the discussion above have focused on how to manage several remote resources simultaneously and receive the results as they arrive, rather than waiting to process each one in the order of the original request. This certainly is an important use of PHP concurrency. Practical applications occasionally can be accelerated by a factor of 10 or more.

What if a slowdown is closer to home? Is there a way to speed up PHP results limited by local processing? There are several. If anything, these are even less well-known than the socket-oriented approach of Listing 1. There are several reasons for this, including:

  • Most PHP pages are fast enough — Better performance would be an advantage, but not enough of one to merit investment in new code.
  • PHP's use in Web pages can render partial accelerations inconsequential — Reordering a calculation so intermediate results become available more quickly doesn't matter when the only criterion of merit is how long it takes to deliver a Web page as a whole.
  • Few local bottlenecks are under PHP's control — Users might complain that it takes 8 seconds to pull up details from an account record, but that's likely to be a constraint of database processing or some other resource external to PHP. Even if we reduce PHP processing to zero, it'll still take more than 7 seconds just for the lookup.
  • Even fewer constraints are parallelizable — Suppose a particular page computes a suggested trade price for a specific listed common stock, and the calculation is sufficiently complicated to require many seconds. The calculation might be intrinsically sequential. There's no apparent way to partition it as "teamwork."
  • Few PHP programmers understand PHP's potential for concurrency. Among the few with performance requirements that admit parallelization, most I've met simply recite that PHP "doesn't do threads" and resign themselves to their existing computational model.

Sometimes we can do better, though. Suppose a PHP page needs to calculate two stock prices, perhaps to compare them, and the underlying host happens to be a multiprocessor. In such a case, we might nearly double performance by assigning the two distinct and time-consuming calculations to different processors.

In the universe of all PHP calculations, such instances are rare. However, because I've found it accurately documented nowhere else, I want to include here a model for such speed-ups.

Listing 5. Delay server implementation
          <?php
          echo "Program starts at ". date('h:i:s') . ".\n";
          
          $timeout=10; 
          $streams=array();
          $handles=array();
          
	  /* First launch a program with a delay of three seconds, then
	     one which returns after only one second. */
          $delay=3;
          for ($id=0; $id <= 1; $id++) {
	      $error_log="/tmp/error" . $id . ".txt"
              $descriptorspec=array(
                  0 => array("pipe", "r"),
                  1 => array("pipe", "w"),
                  2 => array("file", $error_log, "w")
              );
              $cmd='sleep ' . $delay . '; echo "Finished with delay of ' .
                      $delay . '".';
              $handles[$id]=proc_open($cmd, $descriptorspec, $pipes);
              $streams[$id]=$pipes[1];
              $all_pipes[$id]=$pipes;
              $delay -= 2;
          }
          
          while (count($streams)) { 
              $read=$streams; 
              stream_select($read, $w=null, $e=null, $timeout); 
              foreach ($read as $r) { 
                  $id=array_search($r, $streams); 
                  echo stream_get_contents($all_pipes[$id][1]);
                  if (feof($r)) {
                      fclose($all_pipes[$id][0]);
                      fclose($all_pipes[$id][1]);
                      $return_value=proc_close($handles[$id]);
                      unset($streams[$id]); 
                  }
              } 
          } 
         ?>

This program produces output like this:

	  Program starts at 10:28:41.
	  Finished with delay of 1.
	  Finished with delay of 3.

The point here is that PHP launched two independent subprocesses, retrieved the output of the first one to finish, then the output of the second one, even though the latter started earlier. If the host is a multiprocessor machine, and the operating system is correctly configured, the operating system itself takes responsibility for assigning the different subprograms to different processors. This is one way to use PHP to advantage on a multiprocessing host.


Summary

PHP multitasks. PHP doesn't support threading in the way other languages like the Java™ programming language or C++ do, but the examples above show that PHP has more potential for speed-ups than many realize.

Resources

Learn

  • "A PHP V5 migration guide" tells what all PHP programmers should know about the latest major release.
  • PHP partially supports "Process Control Functions" that were, early in PHP V4's history, a preferred way to program concurrency.
  • With PHP V5, stream_select and related functions supplant PCNTL for the concurrency described in this article.
  • curl_multi_select is an undocumented entry point of PHP V5 that supports the same style of select programming this article presents and provides full CURL functionality.
  • Celebrated computer scientist John Ousterhout gave an Invited Talk in 1996 on Why Threads Are A Bad Idea (for most purposes).
  • Wez Furlong implemented much of PHP's streams as used in this article.
  • "PHP/TK is a native extension for the PHP programming language that greatly simplifies writing client-side cross-platform GUI applications," according to the PHP/TK Web site.
  • This is the best-maintained list of Ajax frameworks for PHP available.
  • PHP.net is the central resource for PHP developers.
  • Check out the "Recommended PHP reading list."
  • Browse all the PHP content on developerWorks.
  • Expand your PHP skills by checking out IBM developerWorks' PHP project resources.
  • To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
  • Stay current with developerWorks' Technical events and webcasts.
  • Using a database with PHP? Check out the Zend Core for IBM, a seamless, out-of-the-box, easy-to-install PHP development and production environment that supports IBM DB2 9.
  • Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
  • Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
  • Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.

Get products and technologies

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.
  • Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=240625
ArticleTitle=Develop multitasking applications with PHP V5
publish-date=08072007