Contents


Build a searchable CV database with IBM Bluemix and PHP, Part 1

Build a mobile-optimized, data-driven CV search app

Start by indexing the contents of PDF files using the Slim PHP micro-framework with Bootstrap and Bluemix

Comments

Content series:

This content is part # of # in the series: Build a searchable CV database with IBM Bluemix and PHP, Part 1

Stay tuned for additional content in this series.

This content is part of the series:Build a searchable CV database with IBM Bluemix and PHP, Part 1

Stay tuned for additional content in this series.

Recruiters and hiring managers have a thankless job. Typically, they're under pressure from companies looking for a particular set of skills (usually by yesterday) and from job seekers who are looking to maximize their compensation and perks. They also have a massive amount of data to sift through when trying to make a match, typically in the form of candidate CVs (résumés), online skill profiles, and interview notes.

That's where this tutorial comes in.

I will show you how to build a web application that makes it easier for overworked recruiters and hiring managers to store and search CVs and thereby find the perfect match for their client or employer.

In the process, I'll also introduce you to some interesting services from IBM Bluemix® and walk you through the process of hosting the final application on the Bluemix platform.

Run the appGet the code on GitHub

What you will need

The example application will allow users to upload CVs as PDFs from their computers to an online document store. As each CV is uploaded, its contents will be automatically extracted and stored in a search index. The application will also provide a search interface to the index so users can search stored CVs by skill keyword (for example, "PHP" or "Node.js" for a database of developer CVs) to quickly identify candidates who may meet a company's requirements. Needless to say, the application will be mobile-optimized and suitable for use on both smartphones and desktop computers.

Behind the scenes, the application works by orchestrating two services, both available directly through IBM Bluemix:

The application interface will be built with Bootstrap, which is the Swiss Army knife of mobile-friendly user interfaces. It will use the Slim PHP micro-framework to manage application flow, the PDF Parser PHP library for PDF content extraction, and Bluemix for infrastructure and hosting.

There are a lot of technologies in use here, so here's what you'll need:

Note: Any application that uses the Searchly service must comply with the Searchly Terms of Service. Similarly, any application that uses the Object Storage service must comply with its terms of use. Before beginning your project, spend a few minutes reading these requirements and ensuring that your application complies with them.

Step 1: Create the bare application

The first step is to initialize a basic application with the Slim PHP micro-framework. Additional packages are needed for PDF extraction, Elasticsearch index usage, and Bluemix Object Storage access. All these dependencies can be easily downloaded and installed using Composer, the PHP dependency manager. Use this Composer configuration file, which should be saved to $APP_ROOT/composer.json ($APP_ROOT refers to your project directory):

{
    "require": {
        "php": ">=5.6.0",
        "slim/slim": "^3.1",
        "slim/php-view": "^2.0",
        "smalot/pdfparser": "*",
        "elasticsearch/elasticsearch": "~1.0",
        "php-opencloud/openstack": "*"
    },
    "minimum-stability": "dev",
    "prefer-stable": true
}

Install using Composer with the command:

shell> php composer.phar install

Once the necessary components have been downloaded via Composer, create the directories $APP_ROOT/public for all web-accessible files, $APP_ROOT/templates for all views, and $APP_ROOT/src for configuration and other non-public files, where $APP_ROOT refers to the application directory.

shell> cd myapp
shell> mkdir src public templates

Create the file $APP_ROOT/src/settings.php with the following information:

<?php
return [
    'settings' => [
        'displayErrorDetails' => true, // set to false in production
        'addContentLengthHeader' => false, // Allow the web server to send the content-length header

        // Renderer settings
        'renderer' => [
            'template_path' => '../templates/',
        ],
        
        'indexer' => [

        ],
        
        'object-store' => [

        ],
        
    ],
];

The next step is to create a controller script that will initialize the Slim framework. It will also contain callbacks for each of the application's routes, with each callback defining the code to be executed when the route is matched to an incoming request. Create this script at $APP_ROOT/public/index.php with the following content:

<?php
set_time_limit(600);

use GuzzleHttp\Psr7\Stream;
use GuzzleHttp\Client;

// load required files
require __DIR__ . '/../vendor/autoload.php';

// initialize application
$settings = require '../src/settings.php';

$app = new \Slim\App($settings);

// configure dependencies
$container = $app->getContainer();

// view renderer
$container['renderer'] = function ($c) {
  $config = $c->get('settings');
  return new Slim\Views\PhpRenderer($config['renderer']['template_path']);
};


$app->get('/', function ($request, $response, $args) {
  // insert code here 
});

$app->get('/index', function ($request, $response, $args) {
  // insert code here 
})->setName('index');

$app->get('/add', function ($request, $response, $args) {
  // insert code here 
})->setName('add');

$app->post('/add', function ($request, $response, $args) {
  // insert code here 
});

$app->get('/search', function ($request, $response, $args) {
  // insert code here 
})->setName('search');

$app->get('/download/{id}', function ($request, $response, $args) {
// insert code here 
})->setName('download');

// run app
$app->run();

Slim works by defining callback functions for HTTP methods and endpoints. This is done by calling the corresponding Slim method —get() for GET requests, post() for POST requests, and so on — and passing the route to be matched as the first argument to the method. The second argument to the method is a function, which specifies the actions to be taken when the route is matched to an incoming request.

Since the application will support listing, adding, deleting, downloading, and emailing invoices, the script defines routes and placeholders for the /index, /add, /search, and /download URL endpoints. These will be filled in as we progress through the tutorial. The script also reads in settings from the application configuration file created previously, initializes the template renderer, and registers it with Slim.

The final bit of preparation is to create a simple Bootstrap-based user interface with header, footer, and content areas. Here's an example, which will be used for all the application pages shown in subsequent code listings:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>CV Database</title>
    <link rel="stylesheet" 
      href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
    <link rel="stylesheet" 
      href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap-theme.min.css">
    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->    
  </head>
  <body>

    <div class="container">

      <div class="panel panel-default">
        <div class="panel-heading clearfix">
          <h4 class="pull-left">Skill Database</h4>
          <a href="<?php echo $data['router']->pathFor('index'); ?>"
            class="pull-right btn btn-primary btn">Home</a>
        </div>
      </div> 

      <!-- page content here -->            

    </div>
      
    <div class="container">
      <!-- page footer here -->         
    </div> 
    
  </body>
</html>

Step 2: Create a CV upload form

With the basic application skeleton defined, the first step is to decide the inputs the application will support and define an input form to accept them. Let's assume that the application will accept CVs in PDF format, since that's the most common format used by job seekers, and that the system should also allow recruiters to enter each candidate's name, email address, online profile URL, and free-form comments or notes.

Create a form for these inputs at $APP_DIR/templates/add.phtml with the following content (note the common header and footer areas from the previous step are excluded in this and subsequent code listings):

<?php if (!isset($_POST['submit'])): ?>
<div>
  <form method="post" enctype="multipart/form-data" 
    action="<?php echo $data['router']->pathFor('add'); ?>">
    <input type="hidden" name="MAX_FILE_SIZE" value="10000000" />
    <div class="form-group">
      <label for="name">Candidate's name</label>
      <input type="text" name="name" id="name" class="form-control" />
    </div>
    <div class="form-group">
      <label for="email">Candidate's email address</label>
      <input type="text" name="email" id="email" class="form-control" />
    </div>
    <div class="form-group">
      <label for="url">Candidate's profile URL</label>
      <input type="text" name="url" id="url" class="form-control" />
    </div>
    <div class="form-group">
      <label for="upload">Candidate's CV</label> <br/>
      <span class="btn btn-default btn-file">
        <input type="file" name="upload" />
      </span>
    </div>  
    <div class="form-group">
      <label for="notes">Notes</label>
      <textarea name="notes" id="notes" 
        class="form-control"></textarea>
    </div>
  <div class="form-group">
    <button type="submit" name="submit" 
      class="btn btn-primary">Add</button>
  </div>          
  </form>
</div>
<?php else: ?>
<div>
  <div class="alert alert-success">
    <strong>Success!</strong> 
      The CV was successfully added with identifier 
      <strong><?php echo $data['id']; ?></strong>. 
      <a role="button" class="btn btn-primary" 
      href="<?php echo $data['router']->pathFor('add'); ?>">
      Add another?</a>
  </div>
</div>
<?php endif; ?>

This form consists of fields for the candidate's name, email address, website URL, and CV. It also includes a text area for additional notes.

Update the application's control script at $APP_ROOT/public/index.php to render this template whenever the user requests the /add URL route:

<?php
// Slim application initialization - snipped
$app->get('/add', function ($request, $response, $args) {
  return $this->renderer->render($response, 
    'add.phtml', array('router' => $this->router));
})->setName('add');
// other callbacks - snipped

Browsing to the /add URL endpoint should now display a form like the one below.

Figure 1. CV upload form
Image shows CV upload form
Image shows CV upload form

On submission, the data entered into the form is submitted as a POST request to the /add callback handler. This callback handler will do most of the heavy lifting in the application. More specifically:

  • It will validate the input submitted in the form.
  • It will retrieve the uploaded file and verify that it is a PDF file.
  • It will extract the content of the PDF file and pass it to the indexer service for indexing, assigning it a unique identifier in the process.
  • It will save the PDF file to the Object Storage service.

From the above, it should be clear that before implementing the callback handler, it is necessary to initialize the indexing and Object Storage services. The following section discusses the indexing service, while Part 2 discusses the Object Storage service.

Step 3: Initialize the Searchly service

Bluemix offers a number of services, one of which is Searchly, a hosted search service based on Elasticsearch. The starter plan, which is free, provides access to a limited number of indices and disk storage.

To see how this works, initialize a new Searchly service instance on Bluemix by logging in to your Bluemix account and, from the dashboard, clicking the Catalog button. From the resulting list of services, select Application Services, then Searchly. Select the Starter plan, then click Create to create the service.

Figure 2. Searchly service creation
Image shows Searchly service creation
Image shows Searchly service creation

From the service details page, click the Open Searchly Dashboard tab to view the Searchly dashboard. From the bottom of the main dashboard page, note the Connection URL and copy this to the indexer[url] key in $APP_ROOT/src/settings.php.

Figure 3. Searchly service credentials
Image shows Searchly service credentials
Image shows Searchly service credentials

From the Searchly dashboard, select the Home > Indices menu, click New Index, and create a new index: cvs.

Figure 4. Searchly index creation
Image shows Searchly index creation
Image shows Searchly index creation

The index should now be created with status "open."

Figure 5. Searchly index creation
Image shows Searchly index creation
Image shows Searchly index creation

Step 4: Upload and index CVs

Once the indexing system is initialized, the next step is to initialize the PDF parser and the PHP Elasticsearch client, both of which were downloaded through Composer in Step 1. Add the following code to $APP_ROOT/public/index.php, prior to the callback functions:

<?php
// Slim application initialization - snipped
// configure dependencies
$container = $app->getContainer();

// pdf parser
$container['pdfparser'] = function ($c) {
  return new Smalot\PdfParser\Parser();
};

// indexer
$container['indexer'] = function ($c) {
  $config = $c->get('settings');
  $params['hosts'] = array($config['indexer']['url'] . ':80');
  return new Elasticsearch\Client($params);
};

The code above uses the Slim dependency injection container to configure and prepare the PDF parser and indexer client for use.

Once the form is submitted, it goes to a form processor, which accepts and validates the submission and extracts and indexes PDF content. Here's the code, which should be added to $APP_ROOT/public/index.php:

<?php
// Slim application initialization - snipped
$app->post('/add', function ($request, $response, $args) {
  
  $post = $request->getParsedBody();
  $files = $request->getUploadedFiles();
  
  try {
  
    // check for valid inputs
    if (empty($post['name'])) {
      throw new Exception('No name provided');
    }

    if (empty($post['email']) || (filter_var($post['email'], 
      FILTER_VALIDATE_EMAIL) == false)) {
      throw new Exception('Invalid email address provided');
    }

    if (!empty($post['url']) && (filter_var($post['url'], 
      FILTER_VALIDATE_URL) == false)) {
      throw new Exception('Invalid URL provided');
    }
        
    // check for valid file upload
    if (empty($files['upload']->getClientFilename())) {
      throw new Exception('No file uploaded');
    }
    
    // check for valid file type
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    $type = $finfo->file($files['upload']->file);
    if ($type != 'application/pdf') {
      throw new Exception('Invalid file format, only PDF supported');    
    }
    
    // extract text from PDF
    $pdf = $this->pdfparser->parseFile($files['upload']->file);
    $text = $pdf->getText();
    
    // add text to index
    $document = array(
        'name' => strip_tags($post['name']),
        'email' => strip_tags($post['email']),
        'content' => $text,
        'url' => strip_tags($post['url']),
        'notes' => strip_tags($post['notes']),   
     );
     
    $params = array();
    $params['body']  = $document;
    $params['index'] = 'cvs';
    $params['type']  = 'doc';
    $indexerResponse = $this->indexer->index($params);
    $id = $indexerResponse['_id'];

    return $this->renderer->render($response, 
      'add.phtml', array('router' => $this->router, 'id' => $id));
    
  } catch (ClientException $e) {
    throw new Exception($e->getResponse());
  }
});

// other callbacks - snipped

This code defines a callback to handle form submissions via POST. It begins by collecting the various input parameters — name, email address, URL, notes — and validating each using various validators. Similarly, the uploaded file is checked to ensure that it is a PDF.

If the input is valid, the PDF parser is used to extract the content of the PDF as a text string via the parseFile() and getText() methods. This string, together with the other inputs, is converted to a PHP array (representing an Elasticsearch document) and the Elasticsearch PHP client is used to index and save this document to Searchly.

Notice in the listing above that the Elasticsearch PHP client is passed an array of parameters. The body parameter contains the actual content to be indexed, while the index parameter specifies which index to use and the type parameter specifies the document type. Once indexed, the document is assigned a unique ID in the Searchly index.

To see this in action, try adding a CV through the application. If successful, you should see a message like this, which contains the ID for the newly indexed document.

Figure 6. Successful CV upload and indexing
Image shows successful CV upload and indexing
Image shows successful CV upload and indexing

If you then browse back to the Bluemix console, launch the Searchly dashboard and look under Contents, you should see a new record with that ID containing the details for your newly added CV. You can click the record to drill down and view the details of the CV in the index.

Figure 7. Document in Searchly index
Image shows document in Searchly index
Image shows document in Searchly index

Conclusion

At this point, you've indexed the contents of the PDF file, but you'll notice that you haven't actually stored the CV itself in the system. That's where Bluemix Object Storage comes in.

In the second and concluding part of this tutorial series, I'll introduce you to the Bluemix Object Storage service and explain how you can use it to save and download CV files. I'll also walk you through the process of building a search interface for the application and searching for candidates using skills as keywords. Finally, I'll show you how to deploy everything to a secure, robust, and scalable environment in the Bluemix cloud.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing
ArticleID=1043493
ArticleTitle=Build a searchable CV database with IBM Bluemix and PHP, Part 1: Build a mobile-optimized, data-driven CV search app
publish-date=03082017