Building web applications with YQL and PHP, Part 2

Use PHP and YQL to retrieve and combine data from multiple web services

The Yahoo! Query Language (YQL) provides a unified interface to multiple third-party web services, allowing you to search, add, update, and delete data through a simple, SQL-like syntax. Combine the powerful XML processing tools in PHP with YQL to easily add data from online services to a web application quickly, efficiently, and without wading through pages of API documentation.

Vikram Vaswani, Founder, Melonfire

Photo of Vikram VaswaniVikram Vaswani is the founder and CEO of Melonfire, a consulting services firm with special expertise in open-source tools and technologies. He is also the author of the books Zend Framework: A Beginners Guide and PHP: A Beginners Guide.



30 November 2010

Also available in Chinese Japanese

Introduction

In the first part of this article, I introduced you to YQL, the Yahoo! Query Language, which offers a unified, SQL-like interface to retrieve data from multiple web service APIs. I demonstrated how to access the YQL service from a PHP application and offered some examples of how YQL lets you make use of SQL-type constructs to filter, sort, and link query results.

In this second and concluding article, I dive deeper into the world of YQL and demonstrate how you can use YQL from within a PHP application to add, edit, and delete data on third-party web services. I also show you how to use YQL to search through and retrieve data from HTML pages as well as structured XML formats such as RSS and Atom.


Adding data with YQL

Frequently used acronyms

  • API: Application program interface
  • CSV: Comma-separated values
  • HTML: HyperText Markup Language
  • RDBMS: Relational database management system
  • REST: REpresentational State Transfer
  • RSS: Really Simple Syndication
  • SDK: Software Development Kit
  • SKU: Stock-keeping unit
  • SQL: Structured Query Language
  • URL: Uniform Resource Locator
  • XML: Extensible Markup Language

As you saw in the previous article, YQL lets you extract data from third-party web services with SELECT statements. SELECT, however, isn't the only SQL statement supported by YQL; you can also use INSERT, DELETE, and UPDATE queries to manipulate data on third-party services using the same SQL syntax.

To illustrate how this works, consider Listing 1, which uses an INSERT query to add a blog post to a user's blog on Wordpress.com. To run this example, you must replace the BLOGNAME, BLOGUSER, and BLOGPASS variables in the listing with valid Wordpress.com account credentials; you can obtain these by signing up for a free Wordpress.com blog (see Resources for a link).

Listing 1. Adding data with a YQL INSERT query
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// add new post to Wordpress blog using POST
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->env('store://datatables.org/alltableswithkeys');  
  $client->q("INSERT INTO wordpress.post 
    (title, description, blogurl, username, password) VALUES 
    ('Hello world', 'This is my first blog post with YQL...woohoo!', 
    'http://BLOGNAME.wordpress.com', 'BLOGUSER', 'BLOGPASS')");       
  $result = $client->post();
  echo 'Entry posted with ID: ' . $result->results->postid;
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
    exit;
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
    exit;
}
?>

As Listing 1 illustrates, performing an INSERT query with YQL is identical to performing an INSERT query on any SQL-compliant RDBMS. You must specify the table name to which data is to be inserted, followed by a list of keys and values. YQL uses the table definition to convert the query data into an XML-encoded packet that is submitted, through POST, to the Wordpress.com web service. The response is returned to the client as a standard YQL results document, containing the ID of the newly inserted post.

Figure 1 illustrates the newly added post on the Wordpress.com blog.

Figure 1. A blog post added with YQL
Screen capture of a blog post added with YQL

Listing 2 offers a more interactive version of Listing 1, allowing the user to directly enter content into a web form and converting that content into a Wordpress.com blog post.

Listing 2. Interactively adding blog posts with YQL
<html>
  <head></head>
  <body> 
    <h2>Post Blog Entry</h2> 
    <form method="post" action="<?php echo htmlentities($_SERVER['PHP_SELF']); ?>">
      <div>
        Title: <br/>
        <input type="text" name="title" size="40" /> 
      </div>
      <div>
        Body: <br/>
        <textarea name="body" rows="5" cols="30"></textarea>
      </div>    
      <div>
        Blog URL: <br/>
        http:// <input type="text" name="prefix" size="40" /> .wordpress.com
      </div>
      <div>
        Username: <br/>
        <input type="text" name="user" size="40" />
      </div>
      <div>
        Password: <br/>
        <input type="text" name="pass" size="40" />
      </div>
      <input type="submit" name="submit" value="Post" />    
    </form>
    <?php
    if (isset($_POST['submit'])) {
      // validate input (omitted for brevity)
      $title = $_POST['title'];
      $body = $_POST['body'];
      $blog = 'http://' . $_POST['prefix'] . '.wordpress.com';
      $user = $_POST['user'];
      $pass = $_POST['pass'];

      // set up Zend auto-loader
      // load Zend REST client classes
      require_once 'Zend/Loader.php';
      Zend_Loader::loadClass('Zend_Rest_Client');

      // execute YQL query
      try {
        $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
        $client->env('store://datatables.org/alltableswithkeys');  
        $client->q(
          "INSERT INTO wordpress.post (title, description, blogurl, username, password) 
          VALUES ('$title', '$body', '$blog', '$user', '$pass')");       
        $result = $client->post();
      } catch (Zend_Rest_Client_Exception $e) {
          echo "Client error: " . $e->getResponse();
          exit;
      } catch (Exception $e) {
          echo "Error: " . $e->getMessage();
          exit;
      }

      // iterate over query result set
      echo 'Entry posted with ID: ' . $result->results->postid;
    }
    ?>
  </body>      
</html>

Figure 2 illustrates the web form displayed to the user with some input.

Figure 2. A web form to interactively add a blog post to Wordpress.com
Screen capture of a web form to interactively add a blog post to Wordpress.com

Figure 3 displays the blog post on Wordpress.com after the form is submitted.

Figure 3. The newly added blog post on Wordpress.com
Screen capture of the newly added blog post on Wordpress.com

Updating and deleting data with YQL

Just as you can INSERT, you also can UPDATE and DELETE. Consider Listing 3, which uses an UPDATE query to update the newly created post from Listing 1.

Listing 3. Modifying data with a YQL UPDATE query
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// update post on Wordpress blog
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->env('store://datatables.org/alltableswithkeys');  
  $client->q(
    "UPDATE wordpress.post SET title='Updated blog post', 
    description='Look, ma, I updated my blog post via YQL', publish='true' 
    WHERE blogurl = 'http://BLOGNAME.wordpress.com' AND 
    username = 'BLOGUSER' AND password = 'BLOGPASS' AND postid = '5'");       
  $result = $client->post();  
  if ($result->methodResponse->params->param->value->boolean == 1) {
    echo 'Post successfully updated';    
  } else {
    throw new Exception ('Could not delete post');  
  }
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
    exit;
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
    exit;
}
?>

As with any database UPDATE query, you must pass the YQL query a list of fields to be updated, together with their new values, and a WHERE clause that restricts the update to a set of records. In Listing 3, the WHERE clause contains the user credentials as well as the post ID to be updated.

Figure 4 illustrates the updated post on the Wordpress.com blog.

Figure 4. A blog post modified through YQL
Screen capture of a blog post modified through YQL

And Listing 4 demonstrates how to delete a post with YQL.

Listing 4. Deleting data with a YQL DELETE query
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// delete post from Wordpress blog
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->env('store://datatables.org/alltableswithkeys');  
  $client->setConfig(array('timeout' => 30)); 
  $client->q(
    "DELETE FROM wordpress.post WHERE blogurl = 'http://BLOGNAME.wordpress.com' 
    AND username = 'BLOGUSER' AND password = 'BLOGPASS' AND postid = '5'");       
  $result = $client->post();  
  if ($result->methodResponse->params->param->value->boolean == 1) {
    echo 'Post successfully deleted';    
  } else {
    throw new Exception ('Could not delete post');  
  }
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
    exit;
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
    exit;
}
?>

Note that as of this writing, the community YQL table for Wordpress.com does not include functional UPDATE and DELETE support, although a patch is in development. If you have problems running the preceding listings, you can instead reference the author's patched versions of these tables, altering the YQL query string to reference the custom table with a USE clause. Here's an example:

USE 'http://github.com/vikram0/yql-tables/raw/master/wordpress/wordpress.post.xml' 
AS wordpress.post; 
DELETE FROM wordpress.post WHERE blogurl = 'http://BLOGNAME.wordpress.com' 
AND username = 'BLOGUSER' AND password = 'BLOGPASS' AND postid = '5'

You might also need to adjust the client timeout in this case, as shown here:

  $client->setConfig(array('timeout' => 30));

Example application: Cloud shopping cart

With all this background information at hand, let's look at building an example application that utilizes the YQL data modification capabilities. Your target web service in this case is Payvment (see Resources for a link), a "shopping cart in the cloud" implementation. As the name suggests, Payvment allows you to create, edit, and modify shopping carts using a REST API, making it possible to easily integrate shopping cart capabilities into a web application. Payvment also makes a community Open Data Table available for YQL, enabling you to perform cart operations using YQL instead of accessing the Payvment REST API.

With this in mind, let's put together an example application that uses YQL and PHP to build a shopping cart in the cloud. This application defines a basic product catalog, with prices and descriptions for each SKU, and allows users to add these items to their Payvment cart using YQL INSERT queries. Users can also modify quantities or remove items from their shopping carts using UPDATE or DELETE queries. You can retrieve the current contents of the shopping cart (with subtotals) at any time by using a SELECT query.

Listing 5 has the complete code.

Listing 5. A shopping cart built using the Payvment YQL community table
<?php
// set API keys and constants
define (PAYVMENT_API_KEY, 'YOUR-API-KEY');
define (PAYVMENT_ID, 'YOUR-ID');

// define simple product catalog
$catalog = array(
  '167' => array('title' => 'Flying hammer', 'price' => 35.99),
  '543' => array('title' => 'Lightsaber', 'price' => 76.99),
  '129' => array('title' => 'Sonic screwdriver', 'price' => 10.99)
);

// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// start session
session_start();

try {
  // check if a cart already exists, if not, get a new cart ID
  if (!isset($_SESSION['cart'])) {
    $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
    $query = "USE 'http://github.com/vikram0/yql-tables/raw/master/shoppingcart/
              shoppingcart.YourProducts.xml' AS shoppingcart; 
              SELECT * FROM shoppingcart WHERE apiKey='" . PAYVMENT_API_KEY . "'";
    $client->q($query);       
    $result = $client->get();
    if ($result->results->cartfeed->status->StatusCode != 200) {
      throw new Exception ('ERROR: Cannot get shopping cart');  
    } 
    $_SESSION['cart'] = (string) $result->results->cartfeed->results->CartID;
  }

  // check for add/edit/delete operations
  switch(strtolower($_REQUEST['op'])) {
    // add = INSERT
    case 'add':
      $id = $_REQUEST['id'];
      $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
      $query = 
        "USE 'http://github.com/vikram0/yql-tables/raw/master/shoppingcart/
        shoppingcart.YourProducts.xml' AS shoppingcart; 
        INSERT INTO shoppingcart (apiKey, cartID, payvmentID, itemID, itemName, 
        itemPrice) VALUES ('" . PAYVMENT_API_KEY . "', '" . 
        $_SESSION['cart'] . "', '" . PAYVMENT_ID . "', '$id', '" . 
        $catalog[$id]['title'] . "', '" . $catalog[$id]['price'] . "')";
      $client->q($query);       
      $result = $client->post();
      if ($result->results->cartfeed->status->StatusCode != 200) {
        throw new Exception ('ERROR: Cannot add item to shopping cart');  
      } 
      break;  
    // remove = DELETE
    case 'remove':
      $id = $_REQUEST['id'];
      $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
      $query = 
        "USE 'http://github.com/vikram0/yql-tables/raw/master/shoppingcart/
        shoppingcart.YourProducts.xml' AS shoppingcart; 
        DELETE FROM shoppingcart WHERE apiKey = '" . PAYVMENT_API_KEY . "' 
        AND cartID = '" . $_SESSION['cart'] . "' AND itemID = '" . $id . "' 
        AND retailerID = '" . PAYVMENT_ID . "'";
      $client->q($query);       
      $result = $client->post();
      if ($result->results->cartfeed->status->StatusCode != 200) {
        throw new Exception ('ERROR: Cannot remove item from shopping cart');  
      }     
      break;  
    // update = UPDATE
    case 'update':
      $id = $_REQUEST['id'];
      $qty = $_REQUEST['qty'];
      $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
      $query = 
        "USE 'http://github.com/vikram0/yql-tables/raw/master/shoppingcart/
        shoppingcart.YourProducts.xml' AS shoppingcart; 
        UPDATE shoppingcart SET itemQty = $qty WHERE apiKey = '" . PAYVMENT_API_KEY . "' 
        AND cartID = '" . $_SESSION['cart'] . "' AND itemID = '" . $id . "' 
        AND retailerID = '" . PAYVMENT_ID . "'";
      $client->q($query);       
      $result = $client->post();
      if ($result->results->cartfeed->status->StatusCode != 200) {
        throw new Exception ('ERROR: Cannot update item in shopping cart');  
      }         
      break;      
  }

  // get current cart contents (summary and individual items)
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $query = 
    "USE 'http://github.com/vikram0/yql-tables/raw/master/shoppingcart/
    shoppingcart.YourProducts.xml' AS shoppingcart; 
    SELECT * FROM shoppingcart WHERE apiKey='" . PAYVMENT_API_KEY . "' 
    AND cartID='" . $_SESSION['cart'] . "'";
  $client->q($query);       
  $result = $client->get();
  if ($result->results->cartfeed->status->StatusCode == 200) {
    $cart_summary = $result->results->cartfeed->results;
    $cart_items = $result->results->cartfeed->items->storeitems;
  } else {
    throw new Exception ('ERROR: Cannot get shopping cart contents');  
  }  
} catch (Exception $e) {
  die($e->getMessage());  
}
?>
<html>
  <head>
    <style type="text/css">
      table {
                 border-width: 1px;
                 border-spacing: 2px;
                 border-style: outset;
                 border-color: red;
                 border-collapse: collapse;
                 background-color: white;
      }
      table th {
                 border-width: 1px;
                 padding: 4px;
                 border-style: inset;
                 border-color: red;
                 background-color: white;
                 -moz-border-radius: 0px 0px 0px 0px;
        font-weight: bolder;
      }
      table td {
                 border-width: 1px;
                 padding: 4px;
                 border-style: inset;
                 border-color: red;
                 background-color: white;
                 -moz-border-radius: 0px 0px 0px 0px;
      }
      </style>
  </head>
  <body>    
    <h2>Current Shopping Cart</h2>
    <table>
      <tr>
        <th>Item</th>
        <th>Price</th>
        <th>Quantity</th>
        <th>Subtotal</th>
        <th></th>
        <th></th>
      </tr>
      <?php foreach ($cart_items->item as $i): // display current cart ?>
      <form method="post">
      <input type="hidden" name="id" value="<?php echo $i->ItemID; ?>" />
      <tr>
        <td><?php echo $i->ItemName; ?></td>
        <td><?php echo $i->ItemPrice; ?></td>
        <td><input type="text" name="qty" size="3" 
          value="<?php echo $i->ItemQty; ?>" /></td>
        <td><?php echo $i->ItemQtyPrice; ?></td>
        <td><input type="submit" name="op" value="Update" /></td>
        <td><input type="submit" name="op" value="Remove" /></td>
      </tr>  
      </form>
      <?php endforeach; ?>
      <tr>
        <td colspan="3">Total</td>
        <td><?php echo $cart_summary->GrandTotal; ?></td>  
        <td colspan="2"></td>
      </tr>
    </table>  

    <h2>Product Catalog</h2>
    <table>
      <tr>
        <th>Title</th>
        <th>Price</th>
        <th></th>
      </tr>
      <?php foreach ($catalog as $id => $item): // display product catalog ?>
      <form method="post">
      <input type="hidden" name="id" value="<?php echo $id; ?>" />
      <tr>
        <td><?php echo $item['title']; ?></td>
        <td><?php echo $item['price']; ?></td>
        <td><input type="submit" name="op" value="Add" /></td>
      </tr>
      </form>
      <?php endforeach; ?>
    </table>
  </body>
</html>

This code might seem intimidating at first, but it's actually far simpler than it looks. Here's a quick breakdown of the key components of Listing 5:

  • The first few lines of the script define the user's Payvment API key and unique user identifier. These tokens are freely available from the Payvment website (see Resources for a link) on registration. Following this code, the script defines a simple product catalog as an array, with the array keys as SKUs and the corresponding values holding product information. This product catalog is formatted into an HTML table, with controls for the user to add each item to his or her shopping cart.
  • Next, the script initializes a session and checks to see if the session already contains a shopping cart ID. This ID serves to uniquely identify the user's shopping cart across the Payvment API, and it must be included in most API calls. If the shopping cart ID is present in the session, the script continues to the next step; if not, it formulates and executes a dummy SELECT query that returns a new shopping cart ID, and then it saves this ID to the session.
  • Following this code, the script checks to see if the current request includes an add, edit, or delete operation, indicated by the presence of the $_REQUEST['op'] variable. Depending on the operation requested, Zend_Rest_Client performs an INSERT, UPDATE, or DELETE query on the Payvment data table using YQL. Notice that each query includes the API key, user ID, and, in some cases, the ID of the item being modified or deleted.
  • As a final step, the script performs a SELECT query to retrieve the current contents of the shopping cart. This information is then formatted into an HTML table for display, together with appropriate controls to modify quantities or delete items. Notice that the Payvment API automatically calculates item subtotals and the cart total, based on the information provided when adding each item to the cart.

Note that as of this writing, the community YQL table for the Payvment service uses an out-of-date REST API endpoint, although a patch is in development. The previous listing makes use of the author's patched version of the table, available on Github (see Resources).

Figure 5 demonstrates the application in action.

Figure 5. A cloud-based shopping cart
Screen capture of a cloud-based shopping cart and product catalog

There is one important point to note about using INSERT, DELETE, and UPDATE queries over YQL. Most web services that allow these operations require a username, password, and/or API key to perform data modification; you can usually obtain these credentials for free from each service's website. In these cases, the procedure to add or modify data is similar to that in the previous listings, where the credentials form part of the YQL query and are passed to the YQL service by Zend_Rest_Client.

For services that require OAuth authorization, it is better tp use an OAuth-capable toolkit like the Yahoo! Open Social SDK instead of Zend_Rest_Client, as this handles all the details of OAuth authorization for you. You can find a link to this SDK, as well as links to additional documentation, in Resources.


Paginating YQL results

In the previous section of this article, you saw how YQL makes it possible to filter and sort search results, with WHERE clause supports and utility functions to sort, count, and eliminate duplicates from the result set. That's not all you can do, though; YQL also makes it possible to limit the number of records in a result set with both local and remote paging limits.

A remote limit controls the number of records retrieved by YQL from the third-party web service; a local limit controls the number of records returned by YQL to the calling application.

If no limit is specified, YQL typically returns 10 records. To modify this number, pass a limit to the query in parentheses, as in the following example, which sets a remote limit of 15 records:

SELECT * FROM music.artist.popular (15)

To return all records, set the limit to 0, as in the next example:

SELECT * FROM music.artist.popular (0)

Local limits are enforced with the LIMIT and OFFSET clauses, which work much like their SQL counterparts: The LIMIT clause defines the number of records to be returned, while the OFFSET clause specifies the start offset. Here's an example, which returns records 5-9 of a result set:

SELECT * FROM music.artist.popular (15) LIMIT 5 OFFSET 5

With all this control at hand, it's quite easy to implement a simple paging system for YQL results using a pagination widget like Zend_Paginator. Consider Listing 6, which demonstrates one approach to paging YQL results:

Listing 6. Paginating YQL results with Zend_Paginator
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');
Zend_Loader::loadClass('Zend_Paginator');
Zend_Loader::loadClass('Zend_Paginator_Adapter_Array');

try {
  // execute YQL query
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->q("SELECT name FROM music.artist.popular (0)"); 
  $result = $client->get();
  $data = array();
  foreach ($result->results->Artist as $artist) {
    $data[] = (string)$artist['name'];  
  }

  // initialize pager with data set
  $pager = new Zend_Paginator(new Zend_Paginator_Adapter_Array($data));

  // set page number from request
  $currentPage = isset($_GET['p']) ? (int) htmlentities($_GET['p']) : 1;
  $pager->setCurrentPageNumber($currentPage);

  // set number of items per page from request
  $itemsPerPage = isset($_GET['c']) ? (int) htmlentities($_GET['c']) : 10;
  $pager->setItemCountPerPage($itemsPerPage);

  // get pages
  $pages = $pager->getPages();

  // create page links
  $pageLinks = array();
  $separator = ' | ';
  for ($x=1; $x<=$pages->pageCount; $x++) {
    if ($x == $pages->current) {
      $pageLinks[] = $x;      
    } else {
      $q = http_build_query(array('p' => $x, 'c' => $itemsPerPage));
      $pageLinks[] = "<a href=\"?$q\">$x</a>";  
    }
  } 

} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

// iterate over current page set
echo '<h2>Popular Artists</h2>';
$ctr = $pages->firstItemNumber;
foreach ($pager->getCurrentItems() as $item) {
  echo  $ctr . '. ' . $item . '<br/>';  
  $ctr++;
}

// print page links
echo '<div>';
echo 'Pages:' . implode($pageLinks, $separator);
echo '</div>';
?>

Listing 6 first loads the Zend_Paginator component. As the name suggests, this is a ready-made Zend Framework component that you can use to paginate large data sets. It also provides a set of ready-made methods to generate page links, making it easy for users to navigate through the different pages of a result set.

After Zend_Paginator loads, Listing 6 performs an unbounded SELECT query to retrieve a complete set of YQL results. It then massages the returned data into a PHP array and passes it to the Zend_Paginator array data source, which takes care of extracting the array subset required for the current page. Zend_Paginator also takes care of generating the page navigation links, which allow the user to move forward and backward through the result set.

This approach isn't the most efficient way to paginate YQL results, and you should avoid using this example in a production environment. Listing 6 must necessarily re-execute the YQL query on each request, adding a time and performance cost to the pagination exercise. A more efficient approach is to store the YQL result set in a server-side cache and to retrieve the data from there on each request instead of requesting it from the YQL service. Listing 7 illustrates this improvement, using Zend_Cache to hold the YQL result in a server-side file cache.

Listing 7. A more efficient pagination approach using Zend_Paginator and Zend_Cache
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');
Zend_Loader::loadClass('Zend_Paginator');
Zend_Loader::loadClass('Zend_Cache');
Zend_Loader::loadClass('Zend_Paginator_Adapter_Array');

try {

  // set up cache
  $front = array(
     'lifetime' => 2000, 
     'automatic_serialization' => true
  );
  $back = array(
      'cache_dir' => './' 
  );
  $cache = Zend_Cache::factory('Core', 'File', $front, $back);

  // look for YQL data in cache and use if available
  // if not, execute YQL query, get fresh result set and save to cache
  if(!$data = $cache->load('yql')) {     
    // execute YQL query
    // get total count of items
    $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
    $client->q("SELECT name FROM music.artist.popular (0)"); 
    $result = $client->get();
    $data = array();
    foreach ($result->results->Artist as $artist) {
      $data[] = (string)$artist['name'];  
    }
    $cache->save($data, 'yql');
  }

  // initialize pager with data set
  $pager = new Zend_Paginator(new Zend_Paginator_Adapter_Array($data));

  // set page number from request
  $currentPage = isset($_GET['p']) ? (int) htmlentities($_GET['p']) : 1;
  $pager->setCurrentPageNumber($currentPage);

  // set number of items per page from request
  $itemsPerPage = isset($_GET['c']) ? (int) htmlentities($_GET['c']) : 10;
  $pager->setItemCountPerPage($itemsPerPage);

  // get pages
  $pages = $pager->getPages();

  // create page links
  $pageLinks = array();
  $separator = ' | ';
  for ($x=1; $x<=$pages->pageCount; $x++) {
    if ($x == $pages->current) {
      $pageLinks[] = $x;      
    } else {
      $q = http_build_query(array('p' => $x, 'c' => $itemsPerPage));
      $pageLinks[] = "<a href=\"?$q\">$x</a>";  
    }  
  } 

} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

// iterate over current page set
echo '<h2>Popular Artists</h2>';
$ctr = $pages->firstItemNumber;
foreach ($pager->getCurrentItems() as $item) {
  echo  $ctr . '. ' . $item . '<br/>';  
  $ctr++;
}

// print page links
echo '<div>';
echo 'Pages:' . implode($pageLinks, $separator);
echo '</div>';
?>

The basic logic behind Listing 7 is simple: On each request, the script checks the cache to see if the YQL result already exists. If it does, the cached copy is used to populate the Zend_Paginator data array, and pagination proceeds as usual. Only if a cached copy cannot be found or has expired is the YQL query executed, and the results again are saved to the cache for future use.

As an alternative, you also might consider storing the result data in a client-side variable, and using JavaScript to dynamically populate the page container as the user moves between the pages.

Figure 6 illustrates the output:

Figure 6. YQL results with pagination
Screen capture of YQL results with pagination (list of popular artists)

Performing multiple queries

YQL also provides a number of additional utility tables that can be used for various purposes, such as running multiple queries at once or parsing external feeds. To illustrate, consider Listing 8, which uses the special 'yql.query.multi' table to run a set of three queries at once:

Listing 8. Performing a YQL multi-query
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute multiple YQL queries
// combine results
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $q=<<<EOF
    SELECT * 
    FROM yql.query.multi 
    WHERE queries="
      SELECT * FROM geo.places WHERE text='london';
      SELECT * FROM geo.places WHERE text='hong kong';
      SELECT * FROM geo.places WHERE text='tanzania'
    "
EOF;
  $client->q($q);
  $result = $client->get();
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

// iterate over query result set
foreach ($result->results->results as $result) {
  foreach ($result->place as $place) {
    echo '<div>';
    echo '<strong>' . $place->name . ', ' . 
      $place->country . ' (' . $place->placeTypeName . 
      ') </strong> <br/>';
    echo 'Latitude:' . $place->centroid->latitude . 
    ' Longitude:' . $place->centroid->longitude;
    echo '</div>';    
  }
}  
?>

Listing 8 contains three separate YQL queries, each looking up information about a different city from the geo.places table. To make things a little more efficient, these three queries are wrapped into a single query to the yql.query.multi table, and the results of each query are merged into a single result document. Performing a multi-query in this manner offers some benefits: It results in more readable code, and it can also produce faster queries if all the queries are targeted to the same table, as YQL uses internal caching to avoid reading the same table definition file more than once.

Figure 7 illustrates the output of Listing 8.

Figure 7. The merged result of a YQL multi-query
Screen capture of the merged result of a YQL multi-query (entries with town, country, latitude, and longitude)

Parsing XML feeds

Another set of utility tables simplifies parsing and extracting data from documents in standard XML formats, such as RSS or Atom feeds. Consider Listing 9, which illustrates how to use YQL to parse an external RSS feed and extract a subset of information from it using the rss table:

Listing 9. Parsing RSS feeds with YQL
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// get 5 top stories from Google News RSS feed
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->q("SELECT title FROM rss (5) 
    WHERE url='http://news.google.com/news?ned=us&topic=h&output=rss'");
  $result = $client->get();
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

// iterate over query result set
echo '<h2>Top Stories</h2>';
foreach ($result->item as $item) {
  echo '<div>' . $item->title . '</div>';
}  
?>

Figure 8 illustrates the output.

Figure 8. A list of Google News stories, extracted from an RSS feed with YQL
Screen capture of a list of Google News stories, extracted from an RSS feed with YQL

In a similar vein, with an xml table, you can apply YQL filters to any XML document, combining it with XPath to retrieve specific node collections. Listing 10 illustrates an example of this technique.

Listing 10. Parsing XML documents with YQL
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// get 5 top stories from Google News RSS feed
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->q("SELECT title, link FROM xml (5) 
    WHERE url='http://news.google.com/news?ned=us&topic=h&output=rss' 
    AND itemPath='//item'");
  $result = $client->get();
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}

// iterate over query result set
echo '<h2>Top Stories</h2>';
foreach ($result->item as $i) {
  echo '<div><a href="' . $i->link . '">' . $i->title . 
  '</a></div>';
}  
?>

Listing 10 uses a SELECT query to parse an external XML document and apply an XPath expression to the result to select a specific subset of nodes. This ability to apply SQL-like syntax and filters to XML data in a generic fashion can significantly ease the task of extracting information from complex XML document trees.

And Figure 9 illustrates the output.

Figure 9. A list of Google News stories, extracted from an XML document with YQL
Screen capture of a list of Google News stories, extracted from an XML document with YQL

Extracting data from HTML pages

YQL can also extract data from HTML web pages. YQL includes an html utility table, which you can use as a filter to scrape specific pieces of data from an HTML page. XPath provides the expression language necessary to locate and retrieve individual nodes or node collections from the HTML document tree.

To illustrate how this works, assume that you want to obtain a complete list of Indian states and union territories. This information is available on the official Indian Government website, as a formatted HTML table (Figure 10). View a larger version of Figure 10.

Figure 10. The HTML source of a web page containing data on Indian states
Screen capture of the HTML source of a web page containing data on Indian states

With YQL, it's a matter of a single SELECT query to extract it from its source page into a YQL result set, as in Listing 11.

Listing 11. Parsing HTML documents with YQL
<?php
// set up Zend auto-loader
// load Zend REST client classes
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Rest_Client');

// execute YQL query
// scrape GOI page to get list of Indian states
try {
  $client = new Zend_Rest_Client('http://query.yahooapis.com/v1/public/yql');
  $client->q("SELECT content FROM html 
    WHERE url='http://india.gov.in/knowindia/state_uts.php' 
    AND xpath='//ul[@class=\'sec1\']/li/a'");

  $result = $client->get();
} catch (Zend_Rest_Client_Exception $e) {
    echo "Client error: " . $e->getResponse();
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
}
// iterate over query result set
echo '<h2>List of States in India</h2>';
echo '<ul>';
foreach ($result->a as $state) {
  echo "<li>$state</li>";
}
echo '</ul>';
?>

Figure 11 shows an illustration of the output.

Figure 11. A list of Indian states, extracted from a web page with YQL
Screen capture of a list of Indian states, extracted from a web page with YQL

Conclusion

Clearly, you can do a lot more with YQL than simply extract information from one or more web services. With YQL, it's possible to add and modify data in third-party web applications using standard INSERT, UPDATE, and DELETE queries, and this ability makes it easy to create new cloud-based applications that read and write data from multiple data sources. YQL also offers a number of utility tables that make it possible to extract information from popular file formats such as RSS, Atom, CSV, and HTML using a combination of YQL XPath filters.

YQL is currently under active development, so expect even more benefits in the future. Feel free to experiment with YQL, or add your own data tables to the community repository even as you begin integrating it into your PHP applications. Happy coding!

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=588057
ArticleTitle=Building web applications with YQL and PHP, Part 2
publish-date=11302010