Contents


Intro to graph databases, Part 2

Building a recommendation engine with a graph database

Comments

Content series:

This content is part # of 2 in the series: Intro to graph databases, Part 2

Stay tuned for additional content in this series.

This content is part of the series:Intro to graph databases, Part 2

Stay tuned for additional content in this series.

One of the biggest strengths of graph databases is the ability to quickly and easily generate recommendations for your users. Recommendations make it easier for your users to find what they want, which can, in turn, increase your sales. A win-win!

The Lauren's Lovely Landscapes app currently has a recommendation engine that displays personalized recommendations on the home page. In this tutorial, you'll explore the code behind the existing engine. Then you'll implement a feature to display recommendations on a product's page.

About the app

You will work with a sample online store called Lauren's Lovely Landscapes. The store allows users to browse and purchase prints. The <for developers> page has information specifically about how the app was built and links to insert and delete sample data.

Note: This tutorial uses version 2 of Lauren's Lovely Landscapes. If you have version 1 (which does not have recommendations on the home page), you will need to get a new copy of the code.

laurens lovelely landscapes home page
laurens lovelely landscapes home page

Run the appBrowse the codeDeploy to IBM Bluemix

What you need to get started

Before you begin, you need to register at IBM Bluemix®. You'll also need the latest version of one of these browsers:

  • Chrome
  • Firefox
  • Internet Explorer
  • Safari

You do not need to have completed Part 1 of this tutorial. However, you do need a deployed copy of version 2 of Lauren's Lovely Landscapes (see Deploy the app from Part 1 for instructions) and know how to access the web IDE for your project (see Open the code from Part 1 for instructions).

Explore the existing recommendation engine

Recommendation engines allow you to provide personalized recommendations to users. Recommendations can be simple (for example, you could make a recommendation based on the most purchased item in the last week) or complex (you could make a recommendation based on your user's demographics, purchase history, and social media connections, for example).

In this section, you'll explore the existing recommendation engine that the Lauren's Lovely Landscapes app uses to display personalized recommendations on its home page.

Explore the theory behind the recommendation engine

There are many ways you can generate recommendations for your users. This app makes recommendations based on the Apache TinkerPop recipe for recommendations. As you'll see below, the Gremlin query in Lauren's Lovely Landscapes is different from the one included in the recipe, but the concepts behind the query are the same.

For a detailed explanation of how the recommendation engine works, watch the video below:

In this section, you'll use the Graph Query Editor to incrementally build the Gremlin query used to generate recommendations.

  1. To ensure that the recommendations you generate match this tutorial, delete the data in your graph and insert the sample data:
    1. Navigate to the <for developers> page of your deployed app.
    2. In The Data section, click the link to delete the data.
    3. In The Data section, click the link to insert the sample data.
      Note: Inserting the data can take a minute or two.
  2. Open the Graph Query Editor for your graph instance in a new browser tab or window:
    1. In a new browser tab or window, navigate to Bluemix.net.
    2. On the dashboard, scroll down to the All Services section and click LaurensLovelyLandscapesSample-Graph.
    3. On the Manage tab (open by default), click Open. The Graph Query Editor opens for your graph instance.
  3. By default, the g graph is selected. Switch to the landscapes_graph where your data is stored by clicking the down arrow next to g in the top navigation menu then clicking landscapes_graph.
  4. Let's build the query for our recommendation incrementally so that each step is explored. Start by creating a new traversal. Then search for the user "dale" and name the resulting vertex "buyer" so you can refer to it later in the query. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("user").has("username","dale").as("buyer");
  5. Click the Submit Query button (arrow on white background).
  6. The query results open in a new box. The results show one user vertex for dale.
  7. Add to the existing query. From the user vertex dale that you named buyer, traverse out along all of the "buys" edges to find all of the prints dale has bought. Aggregate the prints together and name them "bought" so you can refer to them later in the query. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("user").has("username","dale").as("buyer")
    .out("buys").aggregate("bought");
  8. Click the Submit Query button (arrow on white background).
  9. The query results open in a new box below. You can see the three prints dale has bought: Las Vegas, Australia, and Japan.
  10. Adding onto the existing query, from the collection of prints that dale bought, you'll traverse in along the buys edges to find all of the user vertexes (except for dale, the buyer) who purchased any of the prints that our user did. Use dedup() to remove duplicates because you only need to find each user once. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("user").has("username", "dale").as("buyer")
    .out("buys").aggregate("bought")
    .in("buys").where(neq("buyer")).dedup();
  11. Click the Submit Query button (arrow on white background).
  12. The results of the query open in a new box. You can see three users (Jason, Deanna, and Joy) purchased at least one of the same prints as dale.
  13. Continuing to add to the existing query, from the collection of users who purchased at least one of the same prints as dale, traverse out along the buys edges to find the prints that these users have purchased. Exclude the prints that dale bought because you don't need to recommend prints to him that he already bought. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("user").has("username", "dale").as("buyer")
    .out("buys").aggregate("bought")
    .in("buys").where(neq("buyer")).dedup()
    .out("buys").where(without("bought"));
  14. Click the Submit Query button (arrow on white background).
  15. The query results open in a new box below. You can see two prints, Antarctica and Alaska, were purchased by the collection of users. Note that the JSON results on the left list the prints multiple times; each time the print is listed represents a purchase. The visual summary on the right only displays each print once.
  16. Adding to the existing query, now that you know the recommended prints, you need to group them together by name, sort them, and list only the top three. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("user").has("username", "dale").as("buyer")
    .out("buys").aggregate("bought")
    .in("buys").where(neq("buyer")).dedup()
    .out("buys").where(without("bought"))
    .groupCount().by('name').order(local).by(valueDecr).limit(local, 3);
  17. Click the Submit Query button (arrow on white background).
  18. The query results open in a new box. You'll see the Alaska print was purchased three times (and is, therefore, our top recommendation), and the Antarctica print was purchased twice (and is, therefore, our second recommendation). Now that you have the query, take a look at the code.
  19. At this point, you could be finished because you have generated an ordered list of recommendations. However, in order for the app to display the image associated with each recommendation, you will need the imgPath property in addition to the name property. Update the query to add a new function named byNameImgPath that handles storing both the image name and image path in the query results. Replace by('name') with by(byNameImgPath) to call this new function. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    java.util.function.Function byNameImgPath = { Vertex v -> "" + v.value("name") + ":" + v.value("imgPath") };
    gt.V().hasLabel("user").has("username", "dale").as("buyer")
    .out("buys").aggregate("bought")
    .in("buys").where(neq("buyer")).dedup()
    .out("buys").where(without("bought"))
    .groupCount().by(byNameImgPath).order(local).by(valueDecr).limit(local, 3);
  20. Click the Submit Query button (arrow on white background).
  21. The query results open in a new box below. You can see that the results now display the imgPath in addition to the name. The query is ready!

Explore the code behind the recommendation engine

Now that you understand the query the recommendation engine uses, take a look at the code:

  1. Open the web IDE for your project (see Open the code from Part 1 for instructions).
  2. In the left navigation pane, click graph.py to open it.
  3. In graph.py, locate the getRecommendedPrints() function around line 43.
  4. The code begins by ensuring the username variable exists so the query can run.
  5. Next, the function creates a new dictionary that contains a Gremlin query. The query is based on the one you wrote in the section above with one difference: Instead of querying for username dale, the code uses dynamic input based on the username argument passed in to the function.
  6. After the dictionary containing the query is created, the function is ready to call the Graph API. Around line 77, the function makes a new POST request to /gremlin and sends the dictionary containing the Gremlin query as part of the request.
  7. Around line 79, the function checks to see if the request was successful (200 response code). If the request was successful, the function processes the results (the recommendations). The results the function receives are sorted; however, the sorting is lost when the results are processed using json.loads(), so the function resorts the results. The function stores important information from the sorted results in prints.
  8. The homepage for Lauren's Lovely Landscapes aims to display three personalized recommendations. The query you just observed might not generate three recommendations. Here's why: (1) if the user is not authenticated or (2) if the user has purchased all or nearly all of the prints the common users have purchased so there are not three prints left to recommend. Around line 94, the function checks if it has three recommendations. If it does, it returns the recommended prints. If not, it continues in an attempt to generate more recommendations.
  9. Around line 100, the function updates the gremlin dictionary to have a new query that searches for the top-most purchased prints by all of the users.
  10. Around line 116, the function checks to see if the request was successful (200 response code). If the request was successful, the function processes the results (the recommendations). As the function did above, it resorts the results. Then it stores the results in prints, checking to make sure it's not storing a duplicate recommendation from the first query results. When the function has finished processing the results, the prints are returned.

Implement new recommendations

Now that you know the basics of building a recommendation engine, it's time to code! In this section, you'll update a product's page to display recommendations based on what other users who bought that print also bought.

To gain a better understanding of the concepts behind this new recommendation engine, watch the video at the top of this tutorial.

Build the query for the new recommendation engine

  1. Open your browser tab or window that has the Graph Query Editor (instructions are in Build > Step 2.) Ensure the landscapes_graph is selected (instructions for how to do this are in Build > Step 3.)
  2. Let's build the query for our recommendation incrementally so that we can discuss each step. Start by creating a new traversal. Then you'll search for the print 'Alaska' and name the resulting vertex currentPrint so you can refer to it later. In the Query Execution Box, input the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("print").has("name", "Alaska").as("currentPrint");
  3. Click the Submit Query button (arrow on white background).
  4. The query results open in a new box below. You can see one print vertex for Alaska.
  5. Adding on to the existing query. From the print vertex Alaska, traverse in along the buys edges to find all of the users who have bought that print. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("print").has("name", "Alaska").as("currentPrint")
    .in("buys");
  6. Click the Submit Query button (arrow on white background).
  7. The query results open in a new box below. You can see that three users bought the Alaska print: Jason, Deanna, and Joy.
  8. Continuing to add to the existing query. From the collection of users who bought Alaska, traverse out along the buys edges to find all the prints these users have bought excluding Alaska ('currentPrint'). In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("print").has("name", "Alaska").as("currentPrint")
    .in("buys").out("buys").where(neq("currentPrint"));
  9. Click the Submit Query button (arrow on white background). The query results open in a new box below. You can see the users have purchased four prints: Antarctica, Australia, Las Vegas, and Japan. Note that the JSON results on the left list the prints multiple times: each time the print is listed represents a purchase. The visual summary on the right only displays each print once.
  10. Add on to the existing query. Now that you know the recommended prints, you need to group them together by name, sort them, and list the top three. In the Query Execution Box, input the following Gremlin query:
    def gt = graph.traversal();
    gt.V().hasLabel("print").has("name", "Alaska").as("currentPrint")
    .in("buys")
    .out("buys").where(neq("currentPrint"))
    .groupCount().by('name').order(local).by(valueDecr).limit(local, 3);
  11. Click the Submit Query button (arrow on white background).
  12. The query results open in a new box below. Las Vegas was purchased three times, Antarctica was purchased two times, and Japan was purchased one time. Australia was also purchased one time, so it is as equally valid of a recommendation as Japan. You could optionally update the query to indicate what the sorting order should be when the prints have been purchased the same number of times, but we will skip this for now.
  13. At this point, you could be finished since you've generated an ordered list of recommendations. However, for the app to display the image associated with each recommendation, you'll need the imgPath property in addition to the name property. Update the query to add a new function named byNameImgPath that handles storing both the image name and image path in the query results. Replace by('name') with by(byNameImgPath) to call this new function. In the Query Execution Box, type the following Gremlin query:
    def gt = graph.traversal();
    java.util.function.Function byNameImgPath = { Vertex v -> "" + v.value("name") + ":" + v.value("imgPath") };
    gt.V().hasLabel("print").has("name", "Alaska").as("currentPrint")
    .in("buys")
    .out("buys").where(neq("currentPrint"))
    .groupCount().by(byNameImgPath).order(local).by(valueDecr).limit(local, 3);
  14. Click the Submit Query button (arrow on white background).
  15. The query results open in a new box below. The results now display the imgPath in addition to the name. The query is ready!

Write the code for the new recommendation engine

Now that you have confirmed the query successfully generates recommendations, it's time to code!

  1. In the file navigation pane of the web IDE you left open in another tab or window, click graph.py to open it.
  2. Above the getRecommendedPrints() function around line 42, paste the following code:
def getCommonlyPurchasedPrints(printName):
  
  # Generate a list of commonly purchased prints by searching for what
  # the people who have bought this print also purchased
  gremlin = {
    # create a new traversal
    "gremlin": "def gt = graph.traversal();" + 
      # create a function that handles storing both the image name and image path in the results
      "java.util.function.Function byNameImgPath = { Vertex v -> \"\" + v.value(\"name\") + \":\" + v.value(\"imgPath\") };" +
      # search for the node of the designated print and name it "currentPrint"
      "gt.V().hasLabel(\"print\").has(\"name\", \"" + printName + "\").as(\"currentPrint\")" +
      # go in to find all of the users who bought the designated print
      ".in(\"buys\")" + 
      # go out to find all prints (excluding the designated print) that these users purchased
      ".out(\"buys\").where(neq(\"currentPrint\"))" +  
      # group and sort to find the top 3 most commonly purchased prints
      ".groupCount().by(byNameImgPath).order(local).by(valueDecr).limit(local, 3);"
    }  
  
  response = post(constants.API_URL + '/' + constants.GRAPH_ID + '/gremlin', json.dumps(gremlin))

  if (response.status_code == 200): 
    results = json.loads(response.content)['result']['data']
    if len(results) > 0:
      results = results[0]
      # We lose the sorting from the query results when we do json.loads.
      # Sort the results in descending order by value.
      results = sorted(results.items(), key=itemgetter(1), reverse=True)
      prints = []      
      for p in results: 
        newPrint = {}
        newPrint['name'] = p[0].split(':', 1)[0]
        newPrint['imgPath'] = p[0].split(':', 1)[1]
        prints.append(newPrint)
        print 'Found print commonly purchased with %s: %s' % (printName, newPrint['name'])
      return prints
        
  raise ValueError('An error occurred while getting a list of commonly purchased prints for print %s: %s %s.' % (printName, response.status_code, response.content))

Note: Spacing is important when programming in Python. Be sure you are using spaces to indent the code appropriately.

Let's examine what this function does. First, the code creates a dictionary that holds the Gremlin query. This query is very similar to the one we generated in the section above except instead of querying for the print called Alaska, the code uses dynamic input based on the printName argument passed in to the function. Second, the code includes the query in a POST request to the Gremlin API. Third, the code processes the results. If the response is 200, the query was successful. Because json.loads() loses the sorting from the query results, the code resorts the results. The code creates a new list named prints where the information about the recommended prints is stored. Then the code begins looping through each result, storing the name and imgPath for each recommended print. Finally, if everything goes well, the code returns prints. If not, the code raises a ValueError.

  1. Now that the back-end code is complete, it's time to get the recommended prints to the frontend. In the left navigation pane in the web IDE, click wsgi.py to open it.
  2. Locate the getPrint() function around line 73. This function is called whenever a user accesses a print's details page. The last line of the try statement in the function returns the template for the print page. Update the arguments that are sent to bottle.template() in the try statement so the recommended prints are included:
    return bottle.template('print', username = 
    request.get_cookie("account", secret=constants.COOKIE_KEY),
    	printInfo = printInfo, commonlyPurchasedPrints = 
    graph.getCommonlyPurchasedPrints(printName))
    Hint: Be sure the spacing before return remains the same.
  3. Now that the print template has access to the recommended prints, you need to update it to display them. In the left navigation pane in the web IDE, expand the views directory and click print.tpl to open it.
  4. After the closing tag of the form but before the line to include the footer around line 28, paste the following code:
% if len(commonlyPurchasedPrints) > 0:
			<h3>Users who ordered this print also ordered...</h3>
			<div class='container'>
				<div class='row'>		
						% for p in commonlyPurchasedPrints:			
							<div class="preview span3">
								<a href="{{p['name']}}">
									{{p['name']}}<br>
									<img src="/static/images/{{p['imgPath']}}" class="thumb">
								</a>
							</div>
						% end
				</div>
			</div> 
		% end

Let's examine what this code does. The code begins by checking to see if there is at least one commonly purchased print to recommend. If so, the code creates a level-three heading with the text "Users who ordered this print also ordered ... ." The code then loops through the commonly purchased prints, displaying the name and image for each.

Deploy the code for the new recommendation engine

Now that you've written the code for the new recommendation engine, let's test it:

  1. In the toolbar at the top of the web IDE, click the Deploy the App from the Workspace button (black background white triangle                         pointing right). The deploy might take a minute or two. When the app is deployed, a green status dot appears beside your app's name in the toolbar.green status dot
    green status dot
  2. When the app is deployed, click the Open the Deployed App button (deploy app) in the toolbar at the top of the web IDE. The deployed version of Lauren's Lovely Landscapes opens.
  3. In your deployed app, click Alaska to open the Alaska print's details page.
  4. Scroll down to see the recommendations feature you just implemented! Congrats!

What next?

Take a moment to reflect on all that you did in this tutorial. You began by learning the theory behind an existing recommendation engine, exploring the query behind the recommendation engine line by line, and taking a look at the code that displays the recommendations. Then you implemented your own recommendation engine that displays commonly purchased products and got to see the feature running live on Bluemix.

Now that you know the basics, you can go on to create even better recommendations. Perhaps you want to limit the recommendations to only recent purchases. Or maybe you want to group users together based on more than just commonly purchased prints — maybe you want to consider a user's demographics or social connections. Or maybe you want to use Watson's Visual Recognition to uncover similarities in the prints themselves and make recommendations based on those similarities. The options are endless. With the power of the Gremlin graph traversal language and the ease of IBM Graph, you have the power to create fabulous, customized recommendation engines.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing, Open source, SOA and web services
ArticleID=1042856
ArticleTitle=Intro to graph databases, Part 2: Building a recommendation engine with a graph database
publish-date=02212017