February 8, 2016 | Written by: Aaron Baughman
Categorized: Customer Stories | Hybrid Deployments
Share this post:
by Aaron K. Baughman & Michelle Welcks
Each player has a pivotal moment during the tournament when even the casual fan is drawn to their play. Angelique Kerber’s moment happened during the match versus England’s Johanna Konta when her twitter reaction drew close to 3,500 positive tweets. Positive tweets for Kerber continued to spike as fans embraced her leading up to the final against Serena Williams. Twitter erupted when Kerber won, two sets to one. During the match, Kerber’s popularity had been maintained for over 24 hours:
Australian Open 2016 women’s winner, Angelique Kerper, has steadily increasing Twitter popularity
On the men’s side, Novak Djokovic proved to be the world’s best tennis player, yet again. He defeated Andy Murray 6-1, 7-5, 7-6 (3). Djokovic’s popularity over Twitter surged when he played Roger Federer on January 28! After Federer’s defeat, it seemed as though the crowd expected Djokovic to maintain his winning posture with a slight yet steady increase of positive tweets for each subsequent match:
Best men’s tennis player, Novak Djokovic, has surge in positive tweets during Federer match
The Social Sentiment Application was continuously available throughout the Australian Open 2016 with impressive results. The IBM hybrid cloud architecture leveraged the strengths of Bluemix, SoftLayer and IBM’s Private Cloud. Docker containers enabled rapid and reusable functions throughout the application. The Software Defined Environment on IBM’s Private Cloud leveraged the principles of agile and idempotency. The Bluemix Streaming Analytics microservice enabled realtime processing of hundreds of thousands of tweets. Flexible networking within Bluemix connected multiple clouds together within a secure environment. The Social Sentiment Application just worked!
Hybrid Cloud handles high volume data
Throughout the tournament, the social sentiment application provided a real time twitter gauge for each player. Hundreds of thousands of tweets were processed and correlated to each player while running on multiple cloud platforms. The hybrid cloud that spanned Bluemix, SoftLayer, and IBM’s Private Cloud provided distributed microservices, network bandwidth, containers and platforms to process large volumes of data.
The number of messages passed within the Bluemix Social Sentiment Application throughout the tournament was large and dense. In total, streams processed 3,688,945 tweets that matched 939 rules in Twitter. The rules for the Australian Open 2016 included twitter handles, nicknames and lemma names. For instance, Serena Williams’s rules consisted of teamserena, serenawilliams, serenafriday and serena williams. The analysis of each tweet took 16.3 seconds where the majority of the time was within the Natural Language Processing algorithms. In total, 151,561 of the tweets or 4% were filtered because they had more than 20% of non Latin characters. None of the tweets were filtered because of network or streams processing congestion.
The User Datagram Protocol (UDP) was the vehicle of transport for the social sentiment application statistics gathering. Each piece of data that provided insight into the running application was packaged into a UDP datagram that traversed the Bluemix network into IBM’s private network. The total number of UDP datagrams exceeded 25 million. The values included the number of tweets that did not contain over 20% of Latin characters, Processing Engine ARchive (PEAR) updates, player comma separated version file updates, the number of tweets filtered due to network congestion and the tweets that were successfully analyzed by Natural Language Processing (NLP) algorithms.
Every second, 24 tweet related messages were sent through the Social Sentiment Application. When put into perspective that initial tweets were aggregated into 5-minute intervals, that is quite impressive. Throughout the duration of the tournament, over 3,960 5-minute intervals were created within IBM Streaming Analytics.
Hybrid cloud architecture for Social Sentiment app
The overall architecture of the Social Sentiment Application brings together Bluemix, SoftLayer and an IBM Private Cloud. Bluemix is a multitenant cloud platform as a service that is built on Cloud Foundry open technology running on SoftLayer infrastructure:
Hybrid Cloud Architecture for the Social Sentiment Application spans Bluemix, SoftLayer and IBM’s Private Cloud
From the Bluemix catalog of services, the Streaming Analytics was provisioned and sized with 2 nodes containing 4 virtual cores, 12GB of RAM and a 1 Gbits/second network. One job was deployed on the running Streaming Analytics instance as a Streams Application Bundle (SAB) file to connect to Twitter. The second job contained the social sentiment logic and UDP drops and syncs that traveled to the IBM Private Cloud.
A custom image was created from the base liberty image on the Bluemix Docker repository. A Dockerfile described, in code, how to generate a custom image so that a Docker container could be instantiated with an application that developed on the Netty framework. (Netty is a NIO client server framework also used by Apache Spark, Apple, Airbnb, Cisco, Facebook, Google, Minecraft, and Twitter.)
The container exposed 3 ports: 1 for UDP, 1 for TCP and another for SSH. Appropriate public keys were copied into the container so that secure shell login was supported. The java home variable is set. And a custom developed Netty Java ARchive (JAR) were copied into the container and chmodded. The entry for the container was defined such that the Netty program listens on the UDP port and forwarded packets to the IBM Private Cloud.
The Netty-based application listened on a given port, bound to a provided IP address, and forwarded UDP datagrams to a specified destination. Each packet was forwarded to StatsD for aggregation.
A Software Defined Environment (SDE) was built on the IBM Private Cloud. Chef was used to converge a Red Hat virtual machine with StatsD, IBM HTTP Server (IHS) and Graphite cookbooks. The StatsD cookbook installed all of the UDP aggregation processes and only reported success when each unit test passed. In addition, the graphite cookbook that includes carbon and whisper was installed to listen for UDP packets and to store messages on disk. The carbon process accepts packets from StatsD and forwards them to whisper for input into a fixed-size database. The web server, IHS, was installed through an included IHS cookbook to expose a RESTful API for the retrieval of time series data.
Bluemix Docker containers have big impact
Docker containers provided a lightweight, encapsulated UDP forwarder component to rapidly deploy parts of the Social Sentiment Application in Bluemix. In general, Docker is great for:
- Managing disk images
- Image distribution for servers built with Chef or Software Defined Environments
- Write once, run anywhere
- Collaboration around an infrastructure’s Operation System
- Application testing
- Lightweight encapsulation of an entire application stack
A Bluemix Docker container wrapped supporting social sentiment code within a complete file system that contained system tools, system libraries and runtime environments. Several layers of configuration were included within a Dockerfile including the use of a base image from the Bluemix image repository. The layers of configuration include the opening of ports, copying of code, file permission expressions, public key management and environment variable initialization. The image was instantiated into a container that runs on any Operating System that has Docker installed.
The Netty JAR was added to the
/bin directory of the container so that the java process could start. The entry point of the container was a shell script that started the java UDP forwarder. Upon container instantiation, Netty accepted Tweet driven UDP messages.
The resulting social sentiment Docker image was uploaded to the private Bluemix image repository and to the publicly available Docker Hub (hub.docker.com/r/baaron/libertyudp). Each time a new container was created in Bluemix, Cloud Foundry was invoked to map TCP, UDP and SSH ports to the uploaded image Bluemix.
Tweets are processed in realtime
Realtime processing of high volume data was orchestrated with IBM Streaming Analytics. A streams processing pipeline maintained twitter fidelity for high quality social sentiment distillation. Every half hour, an accurate social sentiment file was produced and published to IBM SlamTracker and a Social Sentiment visual as shown in the introduction. As the tennis tournament progressed, the number of tweets pulled from Twitter’s PowerTrack steadily increased. Each of the tweets were pushed through the pipeline:
Initial stages of processing include pulling tweets from Twitter’s PowerTrack and filtering them
In the figure above, every tweet was converted into a streams tuple and included in a 5-minute moving window. A UDP datagram was sent to IBM Private Cloud that subsequently entered Graphite to count the number of tweets that entered the system over time. Any tweet that did not have at least 20% Latin characters was filtered. Upon the filtering action, a UDP message was aggregated to count the number of eliminated tweets:
Natural Language Processing (NLP) of each tweet is sent to 1 of 3 analytic engines. UDP messages accumulate statistics about each analytic engine.
Each streams tuple that did not take more than 180 seconds to arrive to the Natural Language Processing elements was sent to 1 of 3 analysis engines. The count of every expired and accepted tweet was accumulated in graphite through a UDP message. As a result, congestion problems in streams were monitored. During the entire Australian Open 2016, the application did not experience any network congestion! Each analytic engine pulled a PEAR file from ObjectStorage to update tennis dictionaries and lexicons since slang language routinely evolves. The processing time of every tweet was saved along with the number of tweets that were analyzed. The results of the sentiment analysis were unioned before being sent downstream for further processing:
Analysis results are joined with an Association of Tennis Player (ATP) identification and published to a JSON social sentiment file. The JSON file is stored within ObjectStorage
Next, tennis player names and their respective ATP identification number were pulled from ObjectStorage. Each twitter tuple was annotated with the identifier so that a standard player identification number could be used within the JSON social sentiment file. The number of player ATP identification refreshes was sent to Graphite on IBM’s Private Cloud. Finally, each streams tuple was converted into a JSON structure and stored within ObjectStorage.
Software-defined environment is agile
The IBM Private Cloud provides Continuously Availability Services to customers whose business critical web and application workloads require continuous global availability with zero maintenance downtime. As customers become more diversified and agile, services must be provisioned rapidly and consistently. Software-defined environment decreased the total time to provision Graphite, StatsD and IHS from 12 hours to 20 minutes. SDE within IBM’s Private Cloud provides:
- Complete time series data source build
- Automated architecture
- Single automation language – Ruby
- Infrastructure code repository
- Node convergence unit tests
- Idempotent operations
The Graphite virtual machine was configured with Chef by converging two cookbooks that have multiple recipes and other dependent cookbooks. The Graphite cookbook has 3 recipes. The first recipe installs the base graphite, including carbon and whisper, and calls the IHS cookbook. The IHS recipe configures the web server while the log recipe determines the output location for standard outs and errors. The next part of the application stack installs the StatsD cookbook, which includes python modules. Kitchen tests enable the use of vagrant and virtual box to run unit tests during build and before convergence. (See Continuous Availability at IBM to learn how we use Chef for events on chef.io).
After the Graphite node has been converged, UDP messages can be sent to StatsD. A web front-end exposed RESTful services so that time series data about the Social Sentiment Application can be retrieved. The following are example curls that returned relevant data in JSON format.
To retrieve a summary of all tweets that expired due to Bluemix network congestion (which it turns out there were none) issue:
The total number of tweets filtered due to content errors every 24 hours was retrieved with:
Each refresh of the PEAR file was retrieved by:
Flexible networking connects clouds
We developed and deployed the Social Sentiment Application, across three cloud environments using a robust, high-volume microservice architecture, without being hampered by the necessities and downtime of in-flux network configurations. As a result, we were able to focus on software development and our code. Bluemix networking is flexible, robust and convenient to connect multiple clouds. For example, Bluemix dynamically managed route tables as services and containers were created and changed. Integrated Layer 7 routing simplified the deployment process.
Flexible networking in Bluemix enabled Social Sentiment app to send UDP datagrams to IBM’s Private Cloud
The networking architecture for the Social Sentiment Application sends UDP messages from Bluemix through the Internet and finally to IBM’s Private Cloud. Even though UDP is a send and forget protocol, packet loss was less than several percentage points. The twitter counts in Graphite were contrasted with IBM Streams Analytics accumulation of tweets to benchmark quantitative UDP delivery to IBM’s Private Cloud. All of the message passing within IBM Bluemix utilized the Docker UDP Forwarder’s private IP address. A reserved public IP address was binded to the Docker container such that IPSec and firewalls could limit the origination of traffic and to which accepting port.
As Docker containers were created to forward UDP messages to IBM’s Private Cloud, Cloud Foundry provided mechanisms to manage public IP addresses. The flexible IP management was important because opening network flows between clouds and securing connections with IP tables required a static IP address. Though ephemeral, the Netty Docker container was able to maintain the same IP address between life cycles. The first step was to request an IP address with the command
cf ic ip request.
cf ic ip list command verified that the IP address was obtained and lists any other IP addresses associated to the region. After an IP address was obtained, any container can be bound to the public IP address using the
cf ic ip bind command.
When the container did not need the IP address anymore, the resource was released back into the global pool of IP addresses. The Netty Docker container that was using the public IP address was unbound using the
cf ic ip unbind command. Next, the public IP address was released with the
cf ic release command.
Alternatively, the Docker Netty container was bound to a public IP address upon instantiation by specifying the IP address in front of each port.
Flexible networking also means secure networking that just works. Intrusion detection was automatically enabled for the Social Sentiment Application. In addition, data isolation and audit logs were created for privacy and anomaly detection. (Also see “Bluemix Security” for more details.) The managed network security features allowed the Social Sentiment project to focus on the Australian Open 2016 requirements.
The Social Sentiment Application on a Hybrid Cloud flawlessly handled high velocity and dense volumes of data in the form of tweets and UDP message. The use of Docker Containers & Images, flexible networking, Software Defined Environment and parallel analytical pipelines were fundamental building blocks for the Hybrid Cloud. The results during the Australian Open 2016 and the seamless connections between Bluemix, SoftLayer and an IBM Private Cloud exemplify robust Systems of Engagement. In the next blog posting, we will discuss emerging trends of cloud computing as well as future work for the Social Sentiment Application.
Editors: John Kent, Nik McCrory, Brian O’Connell, Herbie Pearthree and Dan Kehn