Predictive Cloud Computing for professional golf and tennis, Part 5
Continuous integration and deployment
This content is part # of 8 in the series: Predictive Cloud Computing for professional golf and tennis, Part 5
This content is part of the series:Predictive Cloud Computing for professional golf and tennis, Part 5
Stay tuned for additional content in this series.
In this tutorial, we provide an overview of our continuous integration and continuous deployment architecture designed and implemented for PCC. We discuss the use of Jenkins continuous integration (CI) and IBM UrbanCode™ Deploy, showing how our unit and integration testing has been implemented in Jenkins CI and describing the connection between Jenkins and IBM UrbanCode Deploy. We also show how IBM UrbanCode can automatically deploy to test environments and enable developers to push changes to pre-production and production environments without manual steps or elevated user privileges on systems.
Figure 1. Continuous integration/delivery architecture
Jenkins continuous integration
Continuous integration is the practice of automatically detecting source code changes, through either push or pull mechanisms, and the subsequent building of that source code into artifacts that can be automatically tested. CI systems promote testing and the propagation of small changes by offloading and automating the build and test procedures. Our PCC has used CI systems extensively, permitting developers to check in changes to source control so that the code is automatically tested and deployed into test environments.
Jenkins executes builds and tests through the use of job definitions. Job definitions contain a variety of information, including but not limited to source locations, polling intervals, plugins used, build steps, and conditions. The components of a build can be mixed and configured within any combination that allows flexibility to match any build and testing goal. Jenkins provides many project types, including freestyle, Maven, workflow, and multi-configuration. For PCC, we used a mixture of freestyle and Maven-type projects.
While Jenkins provides numerous built-in functions for continuous integration, the system can be extensively customized through the use of plugins. Predictive Cloud Computing makes extensive use of the Jenkins plugins. Some plugins used by the project include the Maven Integration Plugin, Maven Repository Server Plugin, Artifactory Plugin, Conditional BuildStep Plugin, and the UrbanCode Deploy Plugin.
Figure 2 depicts the 15 Jenkins jobs used to provide continuous integration services for PCC. In general, each job represents a distinct function for PCC that may require separate compilation, testing, or packaging. The colored circles on the left represent the state of the last build where blue is success, red is failure, and yellow is unstable. The weather icons next to the circles indicate the health of the most recent build.
Table 1 lists all the projects in Jenkins and their respective functions to support BigEngine.
Table 1. Projects in Jenkins
|BigEngine||Performs the multithreaded job runs for big data, establishes the end points for RESTful services, and has configuration for WebSphere® Liberty Profile|
|BigEngine-Integration||Performs integration testing for BigEngine project|
|Config||Parses application, tennis, golf, and overall tournament configurations|
|FactorsJar||Feature extractor algorithms from golf and tennis simulations|
|Http-Proxy-Servlet||Servlet to proxy traffic from BigEngine to the web front end|
|Predictive Cloud Database||Project to build liquibase changes|
|RacketStream||Project to extract sentiment analysis in InfoSphere Streams|
|scripts||Collection of scripts used to manage BigEngine|
|Shared||Elements shared between PCC projects|
|Shared-Integration||Integration testing for shared projects|
|StreamsLogAggregator||InfoSphere® Streams project to analyze web logs in real time|
|Twitter Analysis||InfoSphere Streams project to perform analysis ontweet volumes|
|TwitterExporter||InfoSphere Streams project to export twitter feeds to other InfoSphere Streams processes|
|TwitterFeed||InfoSphere Streams project to read from Twitter API|
|WebLogAnalyzer||Java™ program to analyze web log content to extract player mentions from content.|
The BigEngine Job in Figure 2 is the primary job for PCC. This job builds the analytic and decision engine used by PCC to forecast site traffic. The BigEngine-Integration job performs the time-consuming integration testing for BigEngine and is segregated from the BigEngine job to speed up delivery time for small changes. TwitterAnalysis, TwitterExporter, and StreamsLogAggregator jobs are used to create artifacts for deployment within InfoSphere Streams. Additional projects such as the WebLogAnalyzer provide Big Data functions that support the PCC.
Figure 2. The Jenkins continuous integration jobs for PCC
Figure 3 contains the high-level configuration directives for the BigEngine project. The main element to note is that Jenkins is configured to discard old build information by rotating off any builder logs older than the most recent 300 builds. These builds and build logs can be useful for finding and tracking down exactly how and why a build fails. Additionally, this build is restricted to run only on the Jenkins master node. Other builds, such as the InfoSphere Streams–related builds are executed only on Jenkins clients that support streams.
Figure 3. BigEngine job configuration in Jenkins
The source code management configuration for BigEngine, as shown in Figure
4, describes how the build systems access and discover changes within the
source code repository. PCC used Git as a source code repository. Access
to an internal Git server is defined, which includes the credentials
necessary for reading and writing from that repository within the source
code management section of a build. The branch specifier describes a
specific branch that will be built, such as the Wimbledon branch. For each
sporting event, a specific branch was used that enabled the team to manage
tournament-specific configurations. Finally, the system was configured to
build specific files or subdirectories upon content change. In this
configuration, a build would be triggered on any change in the source code
BigEngine/src/*) or within the Maven build description
Figure 4. BigEngine source code management section of Jenkins job
Build triggers for BigEngine are shown in Figure 5. Build triggers describe
how and when Jenkins triggers a build job to produce deployment artifacts
for PCC. Two build triggers are defined for the job in Figure 5. The
SNAPSHOT dependency trigger causes Jenkins to inspect the Project Object
Model (POM) of the project to see if any of the project dependencies are
built on this Jenkins server. Jenkins automatically sets up a dependency
relationship to build any required downstream components. This assists
with continuous integration because any change in a dependency triggers a
downstream build. Additionally, the system is configured to poll the
source code server every three minutes (
H/3) to discover any
changes for the execution of a build.
Figure 5. Build triggers for Jenkins job
The build environment definition for BigEngine is shown in Figure 6. Build environment options control many aspects that affect how the build is completed. For example, build workspaces are deleted before the build starts within BigEngine, ensuring that every build starts from a clean slate. A timestamp is added to the output of each build to trace and track each build. An upstream Maven repository is defined to assist with artifact resolution. Finally, an artifactory server is defined to resolve private artifacts that are necessary during a build.
Figure 6. Build environment for Jenkins job
Figure 7 contains the pre-build, build, and post-build steps for BigEngine.
The pre-build step uses Maven to clean all projects. The build step
defines the Maven POM and goals to use for building. BigEngine defines
several options that customize a build. The option
-T 8 tells
Maven to build with eight threads, which speed up the build process.
-X instructs Maven to provide debug output useful for
analyzing build failures. The
-U option requests Maven to
check for updates of snapshots on remote repositories to ensure the build
is using the latest artifacts. The
-Dskip.integration.tests=true option instructs Maven to skip
integration tests because those are handled by a different job. The
install instruct Maven to
clean the build setup and perform all phases of the build process up to
install. The included phases are validate, compile, test, package, verify,
and install into the local repository. Within the final post-build step,
Jenkins is instructed to copy over the server.xml for the deployment on
Figure 7. Build steps for Jenkins job
Figure 8 depicts part of the configuration for UrbanCode Deploy. The configuration enables the artifacts built in Jenkins to be sent to UrbanCode Deploy for deployment. The target server for deployment is selected from a drop-down menu, such as a production server. A username and password are configured to authenticate with UrbanCode Deploy. The built artifacts are mapped to a component in UrbanCode Deploy, such as the Forecast Engine. The Base Artifact Directory defines where the Jenkins plugin for UrbanCode Deploy should look for uploadable artifacts. The Directory Offset is offset from the Base Artifact Directory and further refines which files should be uploaded. The version is used as a component version in UrbanCode Deploy so the BigEngine version matches the Jenkins build number. The version is automatically set by Jenkins on each build. The include directive is a list of artifacts that result from the build process that should be sent to UrbanCode Deploy for each version of a component.
Figure 8. UrbanCode Deploy publish configuration for Jenkins job
Figure 9 shows the remaining parts of the UrbanCode Deploy Jenkins plugin configuration. For the BigEngine project, the plugin was configured to automatically deploy changes to the PCC test environment whenever a build is complete. The deploy checkbox, when enabled, instructs the plugin to request UrbanCode Deploy to execute a deployment process. The application to deploy is defined, as well as the process in UrbanCode Deploy, to orchestrate the deployment. Within minutes, any change in source control is reflected in the test environment.
Figure 9. UrbanCode Deploy continuous deployment configuration for Jenkins job
Continuous deployment (CD) is the practice of automatic or automated deployment of build artifacts through one or more deployment environments such as test, staging, and production. CD systems promote propagating small change sets into production because they offload, automate, and orchestrate tedious deployment procedures. PCC extensively used UrbanCode Deploy as a CD system. The promotion of many small change sets rather than large change sets were advantageous to assist with PCC debugging and change rollback.
UrbanCode Deploy topology has applications at the top level. An application consists of a group of components, component processes, and application processes. The components are versioned artifacts with associated processes that describe how to deploy those artifacts. The application processes describe how to orchestrate a deployment across components. Components are associated with a resource, often a computing device, which describes where to run the component processes for a given application. Application processes can be executed across different deployment environments such as test, staging, and production.
For PCC, UrbanCode Deploy was used to deploy all components of the project, including WebSphere Liberty applications, Hadoop jobs, Streams jobs, and the visualization components.
Figure 10 depicts the six deployment environments configured in UrbanCode Deploy PCC. The environment marked ECC is the test environment. The pre-production environments were distinct staging environments segregated by cloud region. Likewise, production environments were also separated by cloud region, enabling developers to push updates to one cloud region at a time.
Figure 10. UrbanCode Deploy application view for PCC
Figure 11 shows the configuration for the test environment that denoted ECC in this configuration. Every component that can be deployed to production can also be deployed to the test environment. UrbanCode Deploy supports component versioning such that each deployed component version within a specific environment is viewable.
Figure 11. Test environment in UrbanCode Deploy
One of the production environments and its corresponding components and versions is depicted in Figure 12. Production Plex 3 mirrors the configuration. A PCC developer pushed component versions to a single production plex (region) and verified functionality. After verification and successful system test, the deployment was pushed to the other regions.
Figure 12. Production environment in UCD
Figure 13 shows the high-level application orchestration processes for several deployments. Each of the distinct processes performs a deployment or an action such a server restart. The PCC user selects the process using the play buttons shown on the corresponding environment Figures 10, 11, and 12 above.
Figure 13. Application processes in UCD
A high-level application process for deploying the Predictive Cloud Forecasting service is shown in Figure 14. First the configuration is deployed, followed by any application server updates that stop the application server. Next, the application server is started, followed by post-processing scripts.
Figure 14. BigEngine deployment process in UCD
Figure 15 shows the high-level application process for deploying the UIMA components. Many different Java virtual machines and analytic engines are deployed in parallel. After each analytic engine was started and joined together, the top-level aggregator engine was deployed. Deployments can be complex yet repeatable with UrbanCode Deploy.
Figure 15. UIMA deployment in UrbanCode Deploy
The steps executed to update a Liberty component within a component process called "Install Forecast Engine" are depicted in Figure 16. In this process, properties are imported from Chef, such as the WebSphere Liberty Profile (WLP) installation directory and WLP user. The properties are used in subsequent automation steps. As the WebSphere Liberty Server is stopped, UrbanCode Deploy downloads the server configuration and the application code. Tokens such as port and security features are then replaced in the server configuration to customize WebSphere Liberty Server to the deployment environment. Finally, the server is started by another component process.
Figure 16. WebSphere Liberty component deployment process in UrbanCode Deploy
Figure 17 depicts the steps executed to update an InfoSphere Streams component. The component process stops the Streams application, creates the necessary application directories (if not previously created), downloads the Streams application as a tarball and unpacks it, downloads the configuration, and restarts the application.
Figure 17. InfoSphere Streams component deployment process in UrbanCode Deploy
UrbanCode Deploy provides a web-based editor to ease the creation of repeatable and automated deployment processes for complex applications and components of all types. UrbanCode was used in PCC to deploy and manage versions of components across multiple cloud regions and multiple host environments, such as test, staging, and production. UrbanCode's flexibility permitted the management of deployment across a diverse set of component types, including Hadoop jobs, Streams jobs, WebSphere Liberty applications, UIMA applications, database changes, and configuration files.
The combination of Jenkins for CI and UrbanCode Deployment for CD and deployment automation increased the reliability of PCC with a reduction in the time required to promote changes. Figure 16 shows the savings realized during the 2015 Australian Open by moving this project to a CI/CD environment.
Figure 18. Time savings using CI/CD during 2015 Australian Open
In Part 6, we examine predictive modeling with SPSS Modeler and SPSS Statistics. In addition, we will depict the use of Unstructured Information Management Architecture Scaleout (UIMA-AS) for the discovery of feature vectors that predict large web traffic demand spikes.
- Jenkins CI
- IBM BigInsights for Apache Hadoop basics
- IBM UrbanCode Deploy
- IBM Predicts Cloud Computing Demand for Sports Tournaments