Understanding the Zend Framework, Part 7: Searching

Let's continue on with the "Understanding the Zend Framework" series. In Part 6, you learned how to use the Zend Framework to send e-mail from within your feed reader application. Now, here in Part 7, you will use the Zend Framework to search the titles and content of articles saved via the feed reader application and view the resulting ranked results.

Tyler Anderson (tyleranderson5@yahoo.com), Engineer, Stexar Corp.

Tyler Anderson graduated with a degree in computer science from Brigham Young University in 2004 and is currently in his last semester as a master's student in computer engineering. In the past, he worked as a database programmer for DPMG.com, and he is currently an engineer for Stexar Corp., based in Beaverton, Ore.



18 January 2011 (First published 22 August 2006)

About this series

This series chronicles the building of an online feed reader, Chomp, while explaining all of the major aspects of using the open source PHP Zend Framework.

Part 1 talked about the overall concepts of the Zend Framework, including a list of relevant classes and a general discussion of the MVC pattern. Part 2 expanded on that to show how MVC can be implemented in a Zend Framework application. You also created the user registration and login process, adding user information to the database and pulling it back out again.

See Part 2 for details on installing the Zend framework and XAMPP.

Parts 3 and 4 dealt with the actual RSS and Atom feeds. In Part 3, you learned how to enable users to subscribe to individual feeds and to display the items listed in those feeds. You also discovered some of the Zend Framework's form-handling capabilities, validating data, and sanitizing feed items. Part 4 explained how to create a proxy to pull data from a site that has no feed.

The rest of the series involves adding value to the Chomp application. Part 5 explained how to use the Zend_PDF module to enable the user to create a customized PDF of saved articles, images, and search results. In Part 6, you used the Zend_Mail module to alert users to new posts. Here in Part 7, you will look at searching saved content and returning ranked results. In Part 8, you will create your own mashup, adding information from Amazon, Flickr, and Yahoo! And in Part 9, you will add Ajax interactions to the site using JavaScript object notation.


Introduction

This article explains how to use the Zend_Search module to search existing, current and saved blog entries for a particular search term, and return ranked results. You will learn:

  • How to use the Zend_Search module and related classes to index and search data.
  • How to perform different types of simple and advanced searches using the Zend_Search module.

At the end of this article, you will be able to search feed entries that have been saved in your feed reader. First, you will build a function that creates the search index and adds new content to the index. Next, you will create two actions that will provide the search functionality: search and viewSearchResults. The search action provides a form to perform searches, and the viewSearchResults action processes the input from the form and displays the ranked results to you.


Building the search index

The Zend Framework provides an excellent search mechanism that's simple to use. The search mechanism works by creating an index in a directory that's not Web-accessible. You can then add items to the index with several searchable subitems and search the indexed items in various methods. That's what this section is all about, so start by creating the index.

Creating and adding an entry to the index

To start, you need a helper function that creates a new item for the search index. Then you create the index if it doesn't exist and add the item to it. Create this helper function at the top of FeedController.php, as shown in Listing 1.

Listing 1. Creating and adding items to the search index
<?php
define('INDEX', 'c:\nonWWWAcessibleDirectory\myIndex');

function addEntryToSearchIndex($url, $contents, 
                               $feedname, $articletitle='')
{
    $doc = new Zend_Search_Lucene_Document();
        
    $doc->addField(Zend_Search_Lucene_Field::Text('url', $url));
    $doc->addField(Zend_Search_Lucene_Field::Text('feedname',
                                                  $feedname));
    if($articletitle != '')
        $doc->addField(Zend_Search_Lucene_Field::Text('articletitle', 
                                                      $articletitle));
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', 
                                                      $contents));
        
    if ( !is_dir(INDEX) ) {
        $index = Zend_Search_Lucene::create(INDEX);
    }
    else {
        $index = Zend_Search_Lucene::open(INDEX);
    }
    $index->addDocument($doc);
    $index->commit();
}

class FeedController extends Zend_Controller_Action
...

First, define the non-Web-accessible directory that will contain the search index. This is where the indexed items will be stored. Pass in the URL of the article, the contents of it, the name of the feed or Web page, and the article title. Then create the new document, which you'll add to the index later, and store it in $doc. Next, add four fields to it. The first is the URL, which you store as Text. This means the actual URL will be stored along with the index, and will be retrievable when you search the index later. This means if the entry matches, you'll be able to retrieve its URL and display it back to a user.

Next, store the feedname, which is stored the same way as the URL, and if the articletitle is defined, you store that in the index also. Finally, store the contents of the article as an UnStored type. This type means that the data will be indexed as usual, but it will not be stored along with the index, so it won't be retrievable from a matching result. This is OK, since you'll only need the URL, the feed or Web page name, and the article title, if defined.

Finally, create the index and store it in $index. If the directory of the index already exists, you won't need to create a new one; otherwise, you create a new index (as specified by $newIndex). Then, add the search item ($doc) created earlier to the index and commit it, saving the changes to the index.

You now have the means to create and add items to your index. Next, go to the spot in your code where you'll call this function.

Adding entries to the index

Now that you've created the addEntryToSearchIndex function, you can begin adding items to search to your index. Go to the saveEntryAction method in the FeedController class and add the code in Listing 2.

Listing 2. Adding items to your index
...
                echo 'Error occurred, full text not saved,'.
                     ' please reload.';
                return;
            }
        }

        addEntryToSearchIndex($channelLink,
                              Zend_Filter::noTags($fullText),
                              $feedTitle,
                              $channelTitle);

        $db = Zend_Registry::get('db');
...

Here, you simply call your new method, passing in the URL ($channelLink), the description or full text of the entry. You pass the $fullText string to the Zend_Filter::noTags method so all HTML tags will be removed (no need to index those). You also pass in the name of the feed ($feedTitle) and the article title ($channelTitle). Note that every time a feed is saved to your index, no matter whom it is, that feed will be saved in your index. Another possible task you can explore is to make sure no duplicate entries get added.

That completes this section. Now go add something to your index by saving a feed entry, as done in Part 5. In the next section, you'll start using your index in new actions you'll create in the FeedController class.


Adding new search actions to the FeedController

So you have an index. Now it's time to use it. This section creates two new action methods: searchAction and viewSearchResultsAction. The first causes a search page to display to a user, and the second performs the search, based on parameters it receives from the first, and displays the results back to the user.

searchAction method

Here, you add the searchAction method to the FeedController class, which displays the searchResults view to the user. Do so, as shown in Listing 3.

Listing 3. The searchAction method
    public function searchAction()
    {
        $view = Zend_Registry::get('view');
        $view->title = "Search Results";
        echo $view->render('searchResults.php');
    }

This simply takes the $view object out of the Zend registry, as you've done in previous parts of this series, and displays it to the user. Next, add a link to the main page that will take you to this part of your feed reader.


Adding a link for searching to the main page

You don't yet have a way to reach the /feed/search area of your feed reader, do you? Add the following link, as shown in Listing 4, to the viewFeeds view in viewFeeds.php.

Listing 4. Modifying the viewFeeds view
...
  [<a href="feed/viewSavedEntries">View Saved Entries/
                                   Generate PDF</a>]<br>
  [<a href="feed/search">Search Saved Entries</a>]<br>
  <h1>CHOMP! The Feed Reader</h1>
...

This simply displays the link to search saved entries to users (see Figure 1).

Figure 1. The modified viewFeeds view
The modified viewFeeds view

Clicking this link results in an error because the search view doesn't exist yet. You'll create this view next.

Search view

This view allows users to enter their search to your index. Create this view, searchResults.php, as shown in Listing 5.

Listing 5. The search view
<html>
<head>
    <title><?php echo $this->escape($this->title); ?></title>
</head>
<body>
  [<a href='/'>Back to Main Menu</a>]<br>
  <h1><?php echo $this->escape($this->title); ?></h1>
  
  <form method='GET' action='/feed/viewSearchResults'>
    Query: <input name='query'><br>
    Choose a field to search:<br>
    <input type='radio' name='field' value='raw' checked="yes">
        Raw String (allows fancy search types)<br>
    <input type='radio' name='field'
 value='contents'>Contents<br>
    <input type='radio' name='field' value='feedname'>Feed
 Title<br>
    <input type='radio' name='field' value='articletitle'>
        Article Title<br>
    Slop (not allowed for Raw String searches):
        <input name='slop' value='0'><br>
    <input type='Submit' value='Search'>
  </form>
</body>
</html>

This page allows several types of searches, which you'll learn about more in the viewSearchResultsAction method, next. This view provides a form that allows users to enter a search string and a series of radio buttons that allow users to search specific entries in the index. Note that the method of the form is GET because performing a search has no side effects, and so GET is safe here.

Finally, a slop field is provided. Slop is defined as the number of positions that strings in a phrase are allowed to separate. Thus if the slop is 0, and the phrase is "hey you" then the phrase must be found in the position defined in the query. If the slop is 1, then "hey ... you" is acceptable, where ... is defined as a single word. If the slop is 2, then "hey ... ... you" and "you hey" are acceptable. Thus, for every additional slop value, the values in the phrase are allowed to separate even more. This allows you to configure the near factor for acceptable search results.

Preview the search view in Figure 2.

Figure 2. The search view
The search view

Next, take a look at the search types available.

Search types

There are several search types, and you'll focus on these:

  • Search any phrase with any slop in one field
  • Advanced searches using a raw string query:
    • Search a phrase by providing a space-delimited string of words: "hey you"
    • Search for some words, but not others: "+hey -you" (make sure the document contains "hey" and not"you")
    • Search either of the above two, but you can also specify the field of the word: "Hey -you feedname:Google"

Searching with the above types of advanced querying is a piece of cake for advanced Googlers, but anyone can get a feel for them with practice. Next, add the viewSearchResultsAction method.

viewSearchResultsAction method

This method performs the search and displays results back to the user. Create the viewSearchResultsAction method in the FeedController class, as shown in Listing 6.

Listing 6. The viewSearchResultsAction method
    public function viewSearchResultsAction()
    {
        $input = new Zend_Filter_Input(
            array('*'=>'StringTrim'),
            null,
            $_GET);
        $query = strtolower($input->getUnescaped('query'));
        $slop = $input->getUnescaped('slop');
        $field = $input->getUnescaped('field');
        
        if($field != "raw" | $query == ''){
            $queryObj = new
                Zend_Search_Lucene_Search_Query_Phrase(explode(" ", 
                                                               $query),
                                                       null, $field);
            $queryObj->setSlop($slop);
        }
        else $queryObj = $query;
		
        if ( !is_dir(INDEX) ) {
            $index = Zend_Search_Lucene::create(INDEX);
        }
        else {
            $index = Zend_Search_Lucene::open(INDEX);
        }
        $hits = $index->find($queryObj);

        $view = Zend_Registry::get('view');
        $view->title = "Search Results for: $query";
        $view->hits = $hits;
        echo $view->render('viewSearchResults.php');
    }

This method retrieves the $query, the $slop, and the $field from the GET array. If the $field is not "raw" or a $query wasn't entered, you'll create a special Zend_Search_Lucene_Search_Query_Phrase construct and store it in $queryObj and set the slop to the value in $slop (this allows the first query type shown earlier). Otherwise, you'll set $queryObj to $query, the raw search string (this allows the more advanced query types). Then grab the index and retrieve matching results by calling $index->find($queryObj) and store them in $hits. Last, create and render the viewSearchResults view and display it to the user. You see how this view displays the results next.

viewSearchResults view

This view iterates over the results returned to it and displays them back to the user. Create this view in a file named viewSearchResults.php, and define it as shown in Listing 7

Listing 7. The viewSearchResults view
<html>
<head>
    <title><?php echo $this->escape($this->title); ?></title>
</head>
<body>
  [<a href='/'>Back to Main Menu</a>]<br>
  <h1><?php echo $this->escape($this->title); ?></h1>
  
  <table>
    <tr>
      <td></td>
      <td>Title (Click to view article)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
      <td>Relevancy</td>
    </tr>
  <?php
     $i = 1;
     foreach ($this->hits as $hit) {
         $score = $hit->score;
         $feedTitle = $hit->feedname;
         $channelTitle = $hit->articletitle;
         $url = $hit->url;

         $title = $feedTitle;
         if($channelTitle != '')
             $title = "$title > $channelTitle";
         echo "<tr><td>#" . $i++ . ":</td>";
         echo "<td><a href=\"$url\">$title</a></td>";
         echo "<td>$score</td></tr>";
     }
  ?>
  </table>
</body>
</html>

This view iterates over each of the hits sent to it from the viewSearchResultsAction method. Each matching hit is returned in ranked order with the first result having the highest relevancy, stored as the score. Here, grab the $score, $feedTitle, $channelTitle, and $url from each $hit, and display them back to the user, with a link being provided so users can view the full text (see Figure 3).

Figure 3. The viewSearchResults view
The viewSearchResults view

Well, that's it. Your feed reader now has searching capabilities.


Summary

You completed Part 7 of this "Understanding the Zend Framework" series by mastering the Zend_Search class in the Zend Framework, which allows you to search the saved entries in your feed reader.

The rest of this series involves adding even more value to the Chomp application. In Part 8, you'll add Ajax interactions to the site using JavaScript object notation. Finally, in Part 9, you'll create your own mashup, adding information from Amazon, Flickr, Twitter and Yahoo.


Download

DescriptionNameSize
Part 7 source codeos-php-zend7.source.zip11KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=154040
ArticleTitle=Understanding the Zend Framework, Part 7: Searching
publish-date=01182011