How-tos

Tutorial: GitHub Traffic Analytics with Cloud Functions and Cloud Foundry

Share this post:

Architecture - Traffic Analytics

Architecture – Traffic Analytics

In a new solution tutorial, I show you how to automatically retrieve and store GitHub traffic data the serverless way with IBM Cloud Functions and Db2. The data can then be analyzed via a Web app deployed to Cloud Foundry on IBM Cloud. The app is secured with App ID using OpenID Connect. The new service Dynamic Dashboard Embedded provides visualization of the views and clones of GitHub repositories.

Tutorial overview

Many of my open source projects are hosted on GitHub. In the “Insights” section of each repository, I can see statistics on views and clones of my repositories. This is great, but GitHub only provides access to the traffic data for the last 14 days. If you want to analyze statistics over a longer period of time, you need to download and store that data yourself. In this new tutorial, you deploy a serverless action to retrieve the traffic data and store it in a SQL database. Moreover, a Cloud Foundry app is used to manage repositories and provide access to the statistics for data analytics. The app and the serverless action discussed in the tutorial implement a multi-tenant-ready solution with the initial feature set supporting single-tenant mode. The code is available on GitHub.

Automated, scheduled data retrieval

To automatically retrieve the GitHub traffic data and merge it into the Db2 database, IBM Cloud Functions provides a built-in alarms package. It allows to fire triggers on a regular basis and supports cron-like syntax. An action written in Python makes the necessary GitHub API calls on a weekly basis, thereby retrieves the data and merges it into the database. The necessary access token that authorizes the Python action as well as repository metadata is managed in the same database. The database could be used by multiple tenants, i.e., different users with their own set of GitHub repositories and access token.

Table with repository traffic

Table with repository traffic

Web-based data analytics

A Python Flask app provides access to the traffic statistics. It also allows to add or delete repositories. The app is protected via App ID service. The tutorial uses the Cloud Directory with the users managed by App ID. However, social logins (Google, Facebook) could be easily used, too. From the app, an OpenID Connect module interacts with App ID as authentication provider. After the successful login process, users have access to their role-specific functionality only. Administrators cannot access data and a notion of tenant and tenant-viewer (read-only) is supported.

The web client uses the jQuery plugin DataTables to display and filter the raw data. For data visualization, the client embeds dashboards of the new IBM Cloud service Dynamic Dashboard Embedded. Depending on the mode, either a so

Embedded dashboard showing repository views

Embedded dashboard showing repository views

-called canned dashboard with pre-defined visual elements is shown or users can assemble their own dashboard from a set of given visualizations. The newly defined dashboards can be exported and could be used in future sessions.

Conclusions

Combining serverless and Cloud Foundry, it is possible to overcome the limited availability of GitHub traffic statistics. The data is automatically downloaded and stored in a database. A web app provides access to the data and allows analytics either by filtering on a data tables or utilizing dashboards with data visualizations. Read all the details in the tutorial and check out the code provided in this repository.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.

Technical Offering Manager / Developer Advocate

More How-tos stories
December 12, 2018

Deploying to IBM Cloud Private 3.1 with IBM Cloud Developer Tools CLI

IBM Cloud Developer Tools CLI version 2.1.12 adds deployment support for IBM Cloud Private 3.1.

Continue reading

December 7, 2018

Highly Available Applications with IBM Cloud Foundry

To properly deploy an application in a cloud environment and ensure maximum responsiveness, your app needs to be deployed in a certain (and easy) way that maximizes the chance of an instance always being ready to respond to a user request. This article will explain how to deploy your Cloud Foundry applications in the IBM Cloud such that you reach your target application availability.

Continue reading

December 5, 2018

Cloud Foundry Container-to-Container Networking

If you're like many developers who are deploying applications to Cloud Foundry, you probably don't think about networking too often. After all, as a PaaS, Cloud Foundry takes care of all the routing and connectivity for you. There is one feature, however, you might consider before writing your next app: container-to-container networking.

Continue reading