Contents


Build a mobile app to analyze other apps with Bluemix, Watson Discovery, and Cloudant

Comments

Wanting to learn more about data analytics, I decided to take advantage of the recently released Watson Discovery service. I wanted to see whether I could quickly learn anything of value from the reviews that are given for the top 10 free apps in the iTunes App Store. I chose Python to extract and crawl the app reviews because of its simplicity and speed. I wanted to create a mobile app to display all the data I found, so I used Swift to display the information. App Insights is a mobile app that runs on top of IBM Bluemix, uses the Watson Discovery service to provide an analysis of the reviews, and uses Cloudant to store the app details.

Overview of the App Insights app
Overview of the App Insights app

My app features:

  • Sentiment over time: Sentiment based on specific target phrases Watson Discovery finds in each review.
  • Keywords: Important topics that are pulled from each review.
  • Opportunities: Reviews that are returned by querying the Discovery service. Each review represents an opportunity for you to discover which features to implement, change, or delete.

Prerequisites

You must have the following prerequisites to build the app:

Set up back-end configuration

I use Python to extract, parse, and upload reviews that are grabbed from the App Store's RSS feeds. It then cleans the data and uploads the app details into Cloudant and app reviews into the Discovery service.

  1. Install the following third-party dependencies that the Python scripts require:
    pip install --upgrade watson-developer-cloud
    pip install cloudant
    pip install -U python-dotenv
    pip install beautifulsoup4
    pip install lxml
    pip install cssselect
  2. Insert your Watson Discovery and Cloudant credentials into the Scripts/.env file:
    DISCOVERY_USERNAME="YOUR-DISCOVERY-USERNAME-HERE"
    DISCOVERY_PASSWORD="YOUR-DISCOVERY-PASSWORD-HERE"
    DISCOVERY_VERSION="2016-12-15"
    COLLECTION_NAME="EXAMPLE-COLLECTION-NAME"
    CLOUDANT_USERNAME="YOUR-CLOUDANT-USERNAME-HERE"
    CLOUDANT_PASSWORD="YOUR-CLOUDANT-PASSWORD-HERE"
    DATABASE_NAME="EXAMPLE-DB-NAME"

    Name your Watson Discovery collection and Cloudant database anything you want. If the Discovery collection or Cloudant database instance doesn't exist, the Python script creates them for you (assuming that the credentials for each service were completed correctly).

  3. Run these two scripts in this order to load Cloudant and Discovery with information:
    python Scripts/ingest_reviews.py
    python Scripts/extract_upload_app_details.py

    The ingest_reviews.py script crawls the reviews from the top 10 free apps by extracting reviews from the App Store's RSS feeds, then ingests the reviews into the Discovery service to enrich it. The extract_upload_app_details.py script extracts general app detail information. This includes the app name, description, URL, number of reviews, and rating. This information is then stored inside Cloudant to be called by the app.

Set up front-end configuration

  1. Install third-party dependencies by using Carthage. Note that the first time you run this command can take up to 20 minutes.
    cd app-insights-iOS
    cd carthage update --platform iOS

    This command pulls all dependencies from Carthage. I use the Graphs, the Watson Developer Cloud Swift SDK, and SwiftyJSON libraries.

  2. Insert your Discovery and Cloudant user name and passwords into the app-insights-iOS/app-insights/Configuration.swift file. Insert into the Configuration file the same Discovery collection name and Cloudant database name that you created when running the Python scripts.
    import Foundation
    public struct Credentials {
    static let DiscoveryUsername = "your-discovery-username-here"
    static let DiscoveryPassword = "your-discovery-password-here"
    static let EnvironmentName = "your-discovery-environment-name"
    static let CollectionName = "your-collection-name"
    static let DiscoveryVersion = "2017-02-14"
      static let CloudantUsername = "your-cloudant-username-here"
      static let CloudantPassword = "your-cloudant-password-here"
      static let AppsDBName = "your-cloudant-database-name"
    }

Press Build and run to see the app running in Xcode's iPhone simulator.

Architecture explained

The Python scripts set up a static copy of information that is extracted from the App Store. RSS feeds of each top app found on the iTunes' top charts page are visited to grab each app's reviews. The scriptextract_reviews.py script cleans and uploads every review as a document into a created collection within the Discovery service. Python's ElementTree library is used to extract the relevant information that is needed to turn the XML data from this format into multiple JSON documents like the following:

{
"review": "It's a fun park management game. Just at the moment when you found yourself enjoying the game, poof, all your progress was gone and Atari reassigned you a new player ID.",
"review_id": 1553629663,
"version": "1.07",
"updated": "2017-02-27T01:41:00-07:00",
"rating": 2,
"app_name": "RollerCoaster Tycoon® Touch™v2",
"title": "Fun game till all progress is lost"
}
Overiew of extracting the reviews
Overiew of extracting the reviews

After the extract_reviews.py script finishes running, it creates an ingested_apps.txt text file to log which apps were ingested by the Discovery service. This text file is important because not all apps have RSS feeds. Apps without RSS feeds have no reviews to extract, so these apps are skipped and instead, the next most popular app's reviews are grabbed. The extract_upload_app_details.py file takes this text file to send more URL requests to the App Store to grab each app's details. Using the Cloudant Python SDK, the following JSON is uploaded as a document into Cloudant.

{
"name": "Bitmoji - Your Personal Emoji",
"description": "Bitmoji is your own personal emoji. Create an expressive cartoon avatar. Choose from a huge library of stickers all featuring YOU. Use Bitmoji in Snapchat, iMessage and wherever else you chat. Using Bitmoji in Snapchat unlocks friendmoji 2-person bitmojis featuring you and your friends!",
"imageURL": "http://is4.mzstatic.com/image/thumb/Purple122/v4/2e/53/b3/2e53b39c-5101-94df-5cdf-88e331bc594e/source/1200x630bb.jpg",
"category": "Utilities",
rating: 4.5,
"numberOfReviews": 36907,
"topKeyword": "keyboard",
"numberOfTurnarounds": 16,
"appSentimentValue": -0.079542891986754932
}
Image of app overview
Image of app overview

If you look at the code, you see that the fields topKeyword, numberOfTurnarounds, and appSentimentValue are grabbed from the Discovery service rather than the App Store itself. This is because this data is used as a summary of each app that is extracted on the home page of the app I'm creating. Rather than doubling the amount of URL requests to the Discovery and Cloudant services upon loading the app, I store a summary of what users can see upon clicking each cell.

The most challenging part of creating the app was figuring out if the data extracted contained interesting trends or content. Thankfully, the Discovery service has a neat tool to not only view the collections that are created, but also try out custom configurations, queries, and aggregations.

iOS client-side code

Now that the back end is set up, we can explore which queries and aggregations were used to create the app. By including the Watson Developer Cloud dependency within Carthage, I was able to take advantage of the Swift SDK for Watson and use its Discovery service.

Let's walk through the app-insights-iOS/app-insights/DiscoveryManager.swift file, which lets the app grab the key data that I want to present to users. After defining the constants that access the credentials I've entered into the Configuration/Credentials.swift file, I create a Discovery singleton to allow only one instance to exist for this class, and follow the singleton design pattern. The idea behind the singleton pattern is to control both the instantiation and access to the DiscoveryManager. I decided to take this approach because the app needs only one instance of the DiscoveryManager to access its query methods throughout the various view controllers I have to manipulate the data that is received from the Discovery service.

As an aside, there are many forms of thought behind using singletons. It's commonplace to observe Apple using singletons. However, singletons can act a global state, and object-oriented programming 101 states global variables are bad. (Here's a refresher for why.) Global variables can produce implicit coupling, where the interface between the definition of the object and object usage depends on mutual knowledge that's not explicitly captured in the interface of the object. In other words, hidden dependencies might exist. Scoping issues might also occur due to singletons managing their own lifecycle. However, the current app must access the DiscoveryManager and CloudantManager because I need to access the data they provide throughout the app.

So, how do I use the Swift SDK's Discovery service? Within the DiscoveryManager, I first import DiscoveryV1 to let me access the SDK's Discovery methods. I instantiate a Discoveryobject within the setupDiscovery method that's called when the app home page appears to set up the singleton instance and lazily instantiate the class.

func setupDiscovery(onSuccess success: @escaping () ->Void, onFailure failure: @escaping (DiscoveryErrors) ->Void) {
// Instantiate a Discovery instance using our Credentials.
discovery = Discovery(
username: Credentials.DiscoveryUsername,
password: Credentials.DiscoveryPassword,
version: kDiscoveryVersion)

After instantiating a Discovery instance, I need to grab the environment that the collection of data is stored in. I call on the getEnvironments method with the environment name I created, and store the returned, corresponding environmentID as a fileprivate var. I can now access this variable throughout the DiscoveryManager, which I need because all methods depend on this environment ID. Otherwise, I return a failure:

// Fetch environment
discovery.getEnvironments(withName: kEnvironmentName,
failure: { error in
print("Error - getEnvironments: (error)")
failure(.other(error.localizedDescription))},
success: { environments in
print("Environments: (environments)")
if let environmentID = environments.first?.environmentID {
self.environmentID = environmentID
// Fetch collection
self.getCollectionID(onSuccess: success, onFailure: failure)
} else {
failure(DiscoveryErrors.noEnvironments)
}
})
}

In the success callback, I can now find the data collection that is attached to the environment by calling ongetCollectionID, which depends on having a valid environment ID. Thus, I call the getCollectionID method after I set the environmentID variable to what the service returns to me. After you load the app and click a collection cell on the home screen, you're brought to the AppDetailsview that segues into the GraphViewController. This view controller queries the Discovery service for sentiment data upon loading by using the DiscoveryManager's singleton instance to call on its queryForSentiment method:

func queryForSentiment(appName: String, onSuccess success: @escaping ([GraphSentiment]) ->Void, onFailure failure: @escaping (DiscoveryErrors) ->Void) {
discovery.queryDocumentsInCollection(
withEnvironmentID: environmentID,
withCollectionID: collectionID,
withAggregation: "filter(app_name:(appName)).timeslice(updated,1day).term(review_enriched.docSentiment.type)",
return: "aggregations",
failure: { error in
failure(.other(error.localizedDescription))},
success: { response in
if let responseData = try? JSONSerialization.data(withJSONObject: response.json, options: []) {
var graphSentiments =GraphSentiment
let json = JSON(data: responseData)
                // Unwrap first aggregation response returned by Discovery service. Unwrapping over response index and corresponding value.
                guard let (_, firstAggregation) = json["aggregations"].first else {
                    failure(DiscoveryErrors.unexpectedJSON)
                    return
                }
                // Safely unwrap second part of aggregation json response returned by Discovery service.
                guard let (_, secondAggregation) = firstAggregation["aggregations"].first else {
                    failure(DiscoveryErrors.unexpectedJSON)
                    return
                }

                let timeSlice = secondAggregation["results"]
                for (_, timeSliceInterval) in timeSlice {
                    let time = timeSliceInterval["key_as_string"].stringValue
                    var positiveSentiment = Sentiment(type: "positive", matchingResults: 0)
                    var negativeSentiment = Sentiment(type: "negative", matchingResults: 0)

                    // Iterating over array's index and its corresponding value
                    guard let (_, timeSliceIntervalResults) = timeSliceInterval["aggregations"].first else {
                        failure(DiscoveryErrors.unexpectedJSON)
                        return
                    }
                    for (_, sentiment) in timeSliceIntervalResults["results"] {
                        guard let matchingResults = Int(sentiment["matching_results"].stringValue) else {
                            failure(DiscoveryErrors.stringToIntFailed)
                            break
                        }
                        if sentiment["key"] == "positive" {
                            positiveSentiment.matchingResults = matchingResults
                        }
                        if sentiment["key"] == "negative" {
                            negativeSentiment.matchingResults = matchingResults
                        }
                    }
                    let graphSentiment = GraphSentiment(date: time, positiveSentiment: positiveSentiment, negativeSentiment: negativeSentiment)
                    graphSentiments.append(graphSentiment)
                    success(graphSentiments)
                }

            }
    })
}

Let's break down the method:

func queryForSentiment(appName: String, onSuccess success: @escaping ([GraphSentiment]) ->Void, onFailure failure: @escaping (DiscoveryErrors) ->Void)

This function definition takes in the app name I want to query in, and escaped success and failure closures to allow the app to handle each case. The @escaping closure allows the closure to escape from the scope of the method into the scope of the class. Upon success, I return a strong reference to an array of GraphSentiment objects that I can use in the class I called this queryForSentiment method on. Otherwise, I return a strong reference to a DiscoveryErrors object. Swift 3.0 avoids the problem of objects accidentally escaping closures and causing retain cycles by having closures be nonescaping by default.

discovery.queryDocumentsInCollection(
withEnvironmentID: environmentID,
withCollectionID: collectionID,
withAggregation: "filter(app_name:(appName)).timeslice(updated,1day).term(review_enriched.docSentiment.type)",
return: "aggregations",
failure: { error in
failure(.other(error.localizedDescription))},
success: { response in

I call on the Discovery service's method to query documents. I use the environmentID and collectionID that I defined upon setting up the discovery instance, then I specify which aggregation I want to run on the data.

withAggregation: "filter(app_name:(appName)).timeslice(updated,1day).term(review_enriched.docSentiment.type)",
return: "aggregations",

The Discovery service has its own query reference language that I need to follow. I specify which app I want to query for within the collection by filtering on app name filter(app_name:\(appName). This tells the Discovery service to filter on the keyapp_name for any values that match the string I passed into the function call appName. The '.' chains other aggregations to build on top of what's returned by the filter. Based on the returned app, I create a timeslice, which returns JSON that provides information to build a sentiment graph over time ranged over the course of a day. I then aggregate this information with the term keyword, which pulls out the most frequent document sentiment type—which can be positive, neutral, or negative. In sum, I end up receiving the number of positive, neutral, and negative sentiment reviews per day for the app specified as a result of this aggregation.

The return parameter allows me to specify which information I want to receive from the Discovery service, and can reduce the amount of data that is passed back and time that is needed to take in the information. Because I'm only interested in the aggregation result, I specify the service to return only aggregations. For more information on what data you can aggregate together, look at the Discovery service's query reference.

failure: { error in
failure(.other(error.localizedDescription))},

If the Discovery service fails to parse the aggregation, either due to incorrect syntax, network error, and so on, I return the error.

success: { response in
if let responseData = try? JSONSerialization.data(withJSONObject: response.json, options: []) {
var graphSentiments =GraphSentiment
let json = JSON(data: responseData)

Otherwise, I parse the data that is returned by the Discovery service to return the data as GraphSentiment objects for me to construct the graph. The SDK returns all Discovery responses as raw Foundation objects because the returned object is variant depending on the data ingested and how I query the data. While this makes it harder for a strong-type language like Swift to unpack and use the data, it's not impossible. After I serialize the data into JSON using Foundation's JSONSerialization class, SwiftyJSON to the rescue! The object json is now a JSON object that I can use similar to how to access a Dictionary's keys and values.

The caveat in unpacking the JSON data is that I rely heavily on knowing how the data is returned to determine how to unpack the data. Unfortunately, the only way of knowing how the data is returned is by printing out what the object looks like, such as, print ("graph data = \(json)"). Upon doing so, you'll see this ugly, raw format:

{
"results" : [
{
"id" : "4ef583c4-c7af-4c5c-bba4-189197606c57",
"score" : 1
},
{
"id" : "a2848c63-7f2a-40e7-bcd2-5614ee50c585",
"score" : 1
},
{
"id" : "79102dd5-2129-4e81-bcbd-79bc814c12da",
"score" : 1
},
{
"id" : "5e2b5685-1e08-4706-a470-175151c3b383",
"score" : 1
},
{
"id" : "2cf2c9ac-29dc-4e1f-b465-e7dc7fc1401e",
"score" : 1
},
{
"id" : "63488b40-56ae-4cbb-91ad-209fd40737a1",
"score" : 1
},
{
"id" : "ae8d0803-ee94-4b25-97e7-a7832fc30e15",
"score" : 1
},
{
"id" : "bbc28f66-744f-4487-ae49-f944e7a3223e",
"score" : 1
},
{
"id" : "33113ebf-20e5-498b-bf9f-5450f88b7717",
"score" : 1
},
{
"id" : "0b087b5d-542c-4f9a-bbb7-3d7e92107321",
"score" : 1
}
],
"matching_results" : 4665,
"aggregations" : [
{
"match" : "app_name:Google Maps - Navigation & Transit",
"matching_results" : 480,
"type" : "filter",
"aggregations" : [
{
"interval" : "1d",
"results" : [
{
"key" : 1488412800000,
"matching_results" : 35,
"aggregations" : [
{
"type" : "term",
"field" : "review_enriched.docSentiment.type",
"results" : [
{
"key" : "positive",
"matching_results" : 18
},
{
"key" : "negative",
"matching_results" : 15
},
{
"key" : "neutral",
"matching_results" : 2
}
]
}
],
"key_as_string" : "2017-03-02T00:00:00.000Z"
},
{
"key" : 1488499200000,
"matching_results" : 31,
"aggregations" : [
{
"type" : "term",
"field" : "review_enriched.docSentiment.type",
"results" : [
{
"key" : "positive",
"matching_results" : 18
},
{
"key" : "negative",
"matching_results" : 13
}
]
}
],
"key_as_string" : "2017-03-03T00:00:00.000Z"
},

From all this information returned, I care only about the values within the aggregations key to construct the graph.

// Unwrap first aggregation response returned by Discovery service. Unwrapping over response index and corresponding value.
guard let (_, firstAggregation) = json["aggregations"].first else {
    failure(DiscoveryErrors.unexpectedJSON)
    return
}
// Safely unwrap second part of aggregation json response returned by Discovery service.
guard let (_, secondAggregation) = firstAggregation["aggregations"].first else {
    failure(DiscoveryErrors.unexpectedJSON)
    return
}

After grabbing the value for the first "aggregations" key, I need to unpack the next "aggregations" key that contains most of the values I care about.

let timeSlice = secondAggregation["results"]
for (_, timeSliceInterval) in timeSlice {
let time = timeSliceInterval["key_as_string"].stringValue
var positiveSentiment = Sentiment(type: "positive", matchingResults: 0)
var negativeSentiment = Sentiment(type: "negative", matchingResults: 0)
  // Iterating over array's index and its corresponding value
  guard let (_, timeSliceIntervalResults) = timeSliceInterval["aggregations"].first else {
      failure(DiscoveryErrors.unexpectedJSON)
      return
  }
  for (_, sentiment) in timeSliceIntervalResults["results"] {
      guard let matchingResults = Int(sentiment["matching_results"].stringValue) else {
          failure(DiscoveryErrors.stringToIntFailed)
          break
      }
      if sentiment["key"] == "positive" {
          positiveSentiment.matchingResults = matchingResults
      }
      if sentiment["key"] == "negative" {
          negativeSentiment.matchingResults = matchingResults
      }
  }
  let graphSentiment = GraphSentiment(date: time, positiveSentiment: positiveSentiment, negativeSentiment: negativeSentiment)
  graphSentiments.append(graphSentiment)
  success(graphSentiments)
}

The above code snippet contains the bulk of the work of gathering the pieces of data I need to construct the graph. I decided to create a Sentiment model to store the data and allow the GraphsViewController to store the positive and negative sentiments under the same point in time. I return the array of GraphSentiments constructed. The other two features the app has unpacks the data that I receive by the Discovery service in a similar fashion.

Conclusion

The Discovery service is a great tool to use for beginning data scientists wanting to explore and find patterns among any type of data. The sheer amount of data that exists and is created every day makes it impossible for a single person to scour over it all. The most challenging part is determining how to remove all 'noise,' and concentrate on relevant data that provides direct insights. With the Discovery tool, you can easily play around with any amount and type of data you're wanting to learn more about. Create a collection, ingest the documents, and begin querying inside this tool. After determining what queries and aggregations you want to share and present your discovered insights, there are multiple SDKs to take advantage of to manipulate and present the data you found. The Swift SDK is just one of the multitude of Watson SDKs. I look forward to seeing how you take advantage of the Discovery service and what you create. Now that you've seen what Watson Discovery can do, try it for yourself.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cognitive computing, Mobile development, Cloud computing
ArticleID=1045831
ArticleTitle=Build a mobile app to analyze other apps with Bluemix, Watson Discovery, and Cloudant
publish-date=05222017