Distribute the workload of your PHP application with Gearman

Get to know the work-distibution system Gearman, and distribute the workload of applications written in PHP, C, Ruby, or any other supported language.

Martin Streicher, Software Developer, Pixel, Byte, and Comma

author photo - martin streicherMartin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Master of Science degree in computer science from Purdue University, and has programmed UNIX-like systems since 1986. He collects art and toys.


developerWorks Contributing author
        level

15 December 2009

Also available in Japanese

Although the bulk of a Web application may lay in presentation, its value and competitive advantage may lay in a handful of proprietary services or algorithms. If such processing is complex or protracted, it's best performed asynchronously, lest the Web server become unresponsive to incoming requests. Indeed, an especially compute-intensive or specialized function is best performed on one or more separate, dedicated servers.

Frequently used acronyms

  • API: Application programming interface
  • HTTP: Hypertext Transfer Protocol
  • LAMP: Linux, Apache, MySQL, and PHP

The Gearman library for PHP distributes work among a collection of machines. Gearman queues jobs and doles out assignments, distributing onerous tasks to machines set aside for the task. The library is available for Perl, Ruby, C, Python, and PHP developers and runs on any UNIX®-like platform, including Mac OS X, Linux®, and Sun Solaris.

Adding Gearman to a PHP application is easy. Assuming that you host your PHP applications on a typical LAMP configuration, Gearman requires an additional daemon and a PHP extension. As of November 2009, the latest version of the Gearman daemon is 0.10, and two PHP extensions are available — one that wraps the Gearman C library with PHP and one that's written in pure PHP. This tip uses the former. Its latest version is 0.6.0, and its source code is available from PECL or Github (see Resources).

Note: For purposes of this article, a producer is a machine that generates work requests, a consumer is a machine that performs work, and the agent is the intermediary that connects a producer with a suitable consumer.

Installing Gearman

Adding Gearman to a machine requires two steps: building and starting the daemon and building the PHP extension to match your version of PHP. The daemon package includes all the libraries required to build the extension.

To begin, download the latest source code for gearmand, the Gearman daemon, unpack the tarball, and build and install the code. (The installation step requires the privileges of the superuser, root.)

$ wget http://launchpad.net/gearmand/trunk/\
  0.10/+download/gearmand-0.10.tar.gz
$ tar xvzf gearmand-0.10.tar.gz
$ cd gearmand-0.10
$ ./configure
$ make
$ sudo make install

When gearmand is installed, build the PHP extension. You can fetch the tarball from PECL or clone the repository from Github.

$ wget http://pecl.php.net/get/gearman-0.6.0.tgz
$ cd pecl-gearman
#
# or
#
$ git clone git://github.com/php/pecl-gearman.git
$ cd pecl-gearman

Now that you have the code, building the extension is typical:

$ phpize
$ ./configure
$ make
$ sudo make install

The Gearman daemon is commonly installed in /usr/sbin. You can launch the daemon directly from the command line or add the daemon to your startup configuration to launch each time the machine reboots.

Next, you must enable the Gearman extension. Open your php.ini file (you can identify it quickly with the command php --ini), and add the line extension = gearman.so:

$ php --ini
Loaded Configuration File:         /etc/php/php.ini
$ vi /etc/php/php.ini 
...
extension = gearman.so

Save the file. To verify that the extension is enabled, run php --info and look for Gearman:

$ php --info | grep "gearman support"
gearman
gearman support => enabled
libgearman version => 0.10

You can also verify a proper build and installation with a snippet of PHP code. Save this little application to verify_gearman.php:

<?php
  print gearman_version() . "\n";
?>

Next, run the program from the command line:

$ php verify_gearman.php
0.10

If the version number matches that of the Gearman library you built and installed previously, your system is ready.


Running Gearman

As mentioned earlier, a Gearman configuration has three kinds of actors:

  • One or more producers generate work requests. Each work request names the function it wants, such as email_all or analyze.
  • One or more consumers fulfill demand. Each consumer names the function or functions it provides and registers those capabilities with the agent. A consumer is also called a worker.
  • The agent collectively catalogs all services provided by consumers that contact it. It marries producers with capable consumers.

You can experiment with Gearman quickly right from the command line:

  1. Launch the agent, the Gearman daemon:
    $ sudo /usr/sbin/gearmand --daemon
  2. Run a worker with the command-line utility gearman. The worker needs a name and can run any command-line utility. For example, you can create a worker to list the contents of a directory. The -f argument names the function the worker is providing:
    $ gearman -w -f ls -- ls -lh
  3. The last piece of the puzzle is a producer, or a job that generates lookup requests. You can generate a request with gearman, too. Again, use the -f option to spell out which service you want help from:
    $ gearman -f ls < /dev/null
    drwxr-xr-x@ 43 supergiantrobot  staff   1.4K Nov 15 15:07 gearman-0.6.0
    -rw-r--r--@  1 supergiantrobot  staff    29K Oct  1 04:44 gearman-0.6.0.tgz
    -rw-r--r--@  1 supergiantrobot  staff   5.8K Nov 15 15:32 gearman.html
    drwxr-xr-x@ 32 supergiantrobot  staff   1.1K Nov 15 14:04 gearmand-0.10
    -rw-r--r--@  1 supergiantrobot  staff   5.3K Jan  1  1970 package.xml
    drwxr-xr-x  47 supergiantrobot  staff   1.6K Nov 15 14:45 pecl-gearman

Using Gearman from PHP

Using Gearman from PHP is similar to the previous example, except that you create the producer and consumer actors in PHP. The work of each consumer is encapsulated in one or more PHP functions.

Listing 1 shows a Gearman worker written in PHP. Save the code in a file named worker.php.

Listing 1. Worker.php
<?php
  $worker= new GearmanWorker();
  $worker->addServer();
  $worker->addFunction("title", "title_function");
  while ($worker->work());
   
  function title_function($job)
  {
    return ucwords(strtolower($job->workload()));
  }
?>

Listing 2 shows a producer, or client, written in PHP. Save this code in a file named client.php.

Listing 2. Client.php
<?php
  $client= new GearmanClient();
  $client->addServer();
  print $client->do("title", "AlL THE World's a sTagE");
  print "\n";
?>

You can now connect client to worker from the command line:

$ php worker.php &
$ php client.php
All The World's A Stage
$ jobs
[3]+  Running                 php worker.php &

The worker application continues to run, ready to serve another client.


Advanced features of Gearman

There are many possible uses for Gearman in a Web application. You can import large amounts of data, send reams of e-mail, encode video files, mine data, and build a central log facility — all without affecting the experience and responsiveness of your site. You can process data in parallel. Moreover, because the Gearman protocol is language and platform independent, you can mix programming languages in your solution. You can write a producer in PHP but the workers in C, Ruby, or any language for which a Gearman library is available.

A Gearman network, tying clients to workers, can take virtually any shape you can imagine. Many configurations run multiple agents and scatter workers on numerous machines. Load balancing is implicit: Each operational and available worker, perhaps many per worker host, pulls jobs from the queue. A job can run synchronously or asynchronously and with a priority.

Recent releases of Gearman have expanded the system's features to include persistent job queues and a new protocol to submit work requests via HTTP. For the former, the Gearman work queue remains in memory but is backed by a relational database. Thus, if the Gearman daemon fails, it can recreate the work queue on restart. Another recent refinement added queue persistence via a memcached cluster. The memcached store relies on memory, too, but is distributed over several machines to preclude a single point of failure.

Gearman is a nascent but capable work-distribution system. According to Gearman author Eric Day, Yahoo! uses Gearman across 60 or more servers to process 6 million jobs per day. News aggregator Digg has built a Gearman network of similar size to crunch 400,000 jobs per day. You can find an elaborate example of Gearman in Narada, an open source search engine (see Resources).

Future releases of Gearman will collect and report statistics, provide advanced monitoring, and cache job results, among other things. To track the Gearman project, subscribe to its Google group, or visit its IRC channel, #gearman, on Freenode.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=454849
ArticleTitle=Distribute the workload of your PHP application with Gearman
publish-date=12152009