To compensate for past mistakes, smart enterprises collect both qualitative and quantitative data and draw conclusions from the gathered information that lead to decisive actions for improved business strategies. With the advent of big data, a new breed of business decision maker evolved—one who recognizes the opportunity to take this practice one step further. The new decision maker preemptively anticipates mistakes, modifies the business strategy, and saves the company long iterations, slow product growth, lost revenue, and a high churn rate.
Applying these practices to the agile development lifecycle gets high revenue-driving products out the door faster and at a much lower overall cost to the company. After a product is deployed, data points on product performance can be collected across the organization, allowing for further business process optimization.
This article shows how to build a lean analytics strategy that empowers project managers to optimize their development lifecycle by focusing attention only where needed. Doing so gives product teams the opportunity to make real-time decisions about feature implementation. Most importantly, management can make quick decisions about resource allocation and whether to pivot if necessary before any kind of financial hit results.
Use lean analytics to optimize business processes
The founder of lean methodology, Eric Ries, explains that lean methods combine customer development, agile software-development methodologies, and lean manufacturing practices into a framework for developing products and businesses quickly and efficiently. A core concept of lean is a three-step cycle that Ries calls build, measure, learn—the process by which you do everything, from establishing a vision to building product features to developing marketing strategies.
By the time you finish reading this article, you will learn how to collect important data points during the build phase, then measure that data and learn from it to improve your agile processes.
Think like a data scientist
All too often, project budgets are cut short prematurely because stakeholders expect to see results soon after the first iteration of a product is released. To get those results, the project manager must know how to think like a data scientist from the start of the project. Thinking like a data scientist means knowing how to identify the right metrics, and then knowing how to use the tools that are available to measure them. Tools like IBM® SPSS® are useful for multivariate testing because they identify correlations among data points and suggest which relationships deserve further investigation. Figure 1 shows how IBM SPSS Analytic Catalyst identifies such correlations.
Figure 1. SPSS identifies correlations among data points
The information that is acquired from SPSS is valuable only if it leads to actions that drive the optimization of business processes. In the next section, I look at how to capture key data points that affect your bottom line, previously unconsidered data points.
Data-driven agile process optimization
Project managers must consider the actual business value of the metrics they track. You might evaluate developers on metrics such as the daily lines of code that are written, the number of daily commits each developer makes to the project's code repository, and the number of bugs that are fixed, but these stats are vanity metrics. Vanity metrics make you feel good, but they don't change your actions. Actionable metrics change your behavior by helping you pick a course of action. Although vanity metrics might have value for motivating developers and reporting to stakeholders, they have no intrinsic value to the survivability of the product. They do not give you insight into the actual quality of the code that was written, nor do they indicate how well it scales.
Consider the following scenario. Imagine that you imported a data set into SPSS to determine why the number of defects that are reported per sprint steadily increased with each iteration of a new software product. After you review the data, SPSS reports a significant correlation between the number of lines of code that is written and the number of bugs that are reported for each sprint. With this information, what actions can you take to mitigate the issue? Would you tell the developers to write less code? Obviously, that is not logical.
If you factor in a definitive metric that measures the quality of the written code, however, such as cyclomatic complexity, you can implement protocols to make sure that developers maintain the integrity of the code base, leading to a successful continuous delivery program and a manageable backlog. I explore this concept further to demonstrate how lean analytics are used to do what lean practitioners refer to as eliminating waste.
Eliminate waste by using code metrics
When they read code, software developers must track what all of the branches are doing. Studies on short-term memory show that humans have a working memory of seven plus or minus two items, otherwise known as Miller's Law (see Resources for more information). More recent research suggests that this number is even lower.
Complex code causes waste by forcing allocation of resources to fix bugs and by requiring developers to spend more time reading code than is necessary. Worst of all, the likelihood of causing defects or implementing a "bad fix" scales drastically as cyclomatic complexity increases. Table 1 shows the relationship between cyclomatic complexity and the probability that bugs are introduced each time that the respective code is touched.
Table 1. The relationship between the cyclomatic complexity of a function and the probability that a modification can introduce defects
|Cyclomatic complexity||Modification defect probability|
|100 and up||80%|
The values in Table 1 pertain to a single method or function, whereas an entire class is said to receive a weight-measured cyclomatic complexity, which I use for the data analysis example in a moment. This complexity is the sum of all values that are given to the functions or methods that are contained within a class.
Measuring and tracking code complexity during the build cycle is an excellent example of how to use a metric to increase the long-term value of a software product. The most effective way to measure the complexity of code is to implement test-driven development (TDD) or an incarnation of TDD such as behavior-driven development (BDD). Complexity values are then derived from the number of unit tests that are required to cover functions in the system. In TDD, functions are grouped and tested in a test suite. Each test suite has a number of unit tests. Your testing tool can be used to report the number of unit tests in each suite to agile project management software, such as IBM Rational Team Concert™. That data can then be included in the data set for analysis by SPSS.
Sometimes, changing processes mid-cycle to implement TDD is not practical for large projects that are already underway and do not use TDD. Another way to measure and track cyclomatic complexity for these situations involves the integration of an automated cyclomatic complexity-measuring utility that you can include in the automated build process of an application. Such tools exist for every language and typically give you other valuable metrics about maintainability and class coupling in the code (see Resources for a link). In the following example, I demonstrate how I used IBM SPSS Statistics to analyze a series of code metrics that are generated in Microsoft® Visual Studio®.
Lean analytics for agile development by using SPSS Statistics
In addition to tracking code complexity, you can use other code quality metrics for agile process improvement, such as metrics that look at code "coupling," or the dependencies that exist between code segments. These metrics are measurements of the extent to which functions, objects, classes, or modules depend on other pieces of code in the system to exist. The point of naming and scoring each type of coupling is to generate metrics that are based on this coupling between objects in the system. High coupling values forewarn you of a high probability that defects will be introduced if the class or module must be altered to complete a particular task that is scheduled for the next sprint. The point here is that a number or set of numbers can be derived to determine how tightly or loosely coupled a system or set of modules is for scalability.
Using SPSS Statistics, I analyzed a data set of code metrics that are generated for 10 classes that are contained within a code repository for a project. The data set consists of metrics for each of the 10 classes for nine development iterations (or sprints, in agile terms). I provide a short description of each data point in Table 2.
Table 2. Software package metrics to determine the resilience, maintainability, and scalability of a software system
|Maintainability index||Valued on a scale 0 - 100, indicating the relative ease of maintaining the code||Scale|
|Weight-measured cyclomatic complexity (WMC)||Measures structural complexity; values for each method in the class are totaled to provide a single value||Scale|
|Depth of inheritance||Number of class definitions that extend to the root of the class hierarchy||Scale|
|Class coupling||Number of external dependencies||Scale|
|Lines of lode (IL)||Approximation of the IL code line count||Scale|
I then combined the code metrics with metrics for defects that resulted from tasks that are completed after each sprint and imported the information into SPSS, as shown in Figure 2. To follow along as I walk through the data analysis exercises, download the Microsoft Excel® spreadsheet, open a new file in SPSS Statistics, and paste in the Excel data.
Figure 2. The variable view for the data set in use
Note: Make sure that the variables are configured correctly before you run data analysis by setting the Role to either input, target, or both.
Cross-correlation analysis with SPSS Statistics
After the data set is imported and the variable properties are set, you can begin to run analysis. Start by looking for relationships among the variables by running a cross-correlation analysis. Click Analyze > Forecasting > Cross-Correlations. In the Cross-Correlations dialog box, select the variables to analyze. For this analysis,you are looking for how your code metrics affected the outcome of reported defects at the close of sprint 5, as illustrated in Figure 3.
Figure 3. The Cross-Correlations dialog box
After SPSS runs the analysis, significant correlations among variables are displayed in the results window. The results begin with a model description and case processing summary and continue with a series of charts like the one in Figure 4, which is the first significant relationship that SPSS discovered in the data set.
Figure 4. A graph that is generated by a cross-correlation analysis that demonstrates a significant positive correlation between WMC complexity and the number of reported defects for sprint 5
Another interesting relationship that SPSS found during the analysis is a significant negative correlation between WMC complexity and the maintainability index for each class, as seen in Figure 5. This finding suggests that as cyclomatic complexity increases, the code becomes more difficult to maintain.
Figure 5. The cross-correlation table that SPSS generated describes a significant negative correlation between weighted cyclomatic complexity and the maintainability index for one class in the sample data set
Using Chart Builder for enhanced data visualization
Now that you ran an interesting cross-correlation analysis, create more visualizations to further demonstrate relationships among data points, perhaps to present to the team at the end of a sprint. You can use Chart Builder to visualize key information about the project that you learned from running data analysis. You can see an example of this process in Figure 6, where I generated a dual-axis graph to compare the complexity value to defects for one module in the code base for each sprint. The x-axis represents sprints. The bars on the y-axis represent complexity values, and the green line represents defects.
Figure 6. A dual-axis chart generated by Chart Builder demonstrates the relationship between cyclomatic complexity and defects for one module
To access Chart Builder, click Graphs > Chart Builder. You see a window in which you can custom-build your data visualization, which is shown in Figure 7. As you can see, I selected Dual Axes as the chart category. With the chart category selected, drag the first chart option from the tiled list, labeled Dual Y-Axes with Categorical X-Axis, and drop it onto the preview window. Defining the x and y axes is as simple as a drag-and-drop operation. In Figure 7, I dragged the cyclomatic complexity variable to the y-axis on the left of the chart, the defects variable to the y-axis on the right of the chart, and the sprint variable to the x-axis.
Figure 7. Configuring a dual-axis chart to visually demonstrate the relationship between cyclomatic complexity and defects for a particular module or class
Using SPSS for predictive analytics
Analyzing data points that describe a code base is great for determining causal relationships that affect product delivery. You can obtain even greater value through predictive analytics. For example, you can use the data that is gathered from each sprint to predict the number of defects you might encounter with each code module or class several sprints in advance. It is valuable information for determining how a project might be affected in the future if code is not refactored to eliminate coupling and decrease complexity values. I added data within SPSS for sprints 5 - 9. If you follow along, you can do the same by adding the fields within the variables view for these sprints and pasting the data values from sprints 1 - 5 in the data view.
You can use automatic linear modeling to gather such information. In SPSS, click Analyze > Regression > Automatic Linear Modeling. Then, in the dialog box that appears, choose defects for the target variable and complexity for the predictor variable. Next, click the Model Options tab, and select the Save predicted values to the dataset check box to generate a new column in the data set with a set of values that uses the complexity values as the predictor. Notice in Figure 8 that the software provides predictions for all 14 sprints by using the complexity variable as the predictor. Even though you are only interested in obtaining the values for sprints 10-14, it is interesting to note that the predictions generated for sprints 1-9 are close to the actual values.
Figure 8. Use automatic linear modeling to predict defect values for sprints 10-14
You can run the same procedure to generate values for the complexity of sprints 10-14, then copy those values to their respective columns and create a chart to visualize the predictions, as in Figure 9.
Figure 9. Create a dual-axis chart by using the same process as with Chart Builder to demonstrate predicted outcomes for the next five sprints with a particular module in the code
Earlier, I generated a dual-axis chart to visually represent the data for sprints that were already completed. Now, use the same process with Chart Builder, this time show both the data for sprints that are complete and predicted outcomes for the next five sprints for this particular code module. Despite a simplified data set for this article and fairly basic example analyses, the empowerment that these techniques give to project managers for agile process optimization is obvious.
In this article, you learned how to track the extendibility, scalability, maintainability, and stability of a software project throughout the agile development lifecycle. Then, you used those measurements to gain insight about what the future holds for your project and make key decisions that are based on those predictions. Use these metrics as leading indicators to predict future defects that the team might encounter and act on them for time and resource planning. If a feature that is scheduled for implementation involves modification to a package with a high efferent coupling value and a high instability value, for example, you can assign prioritization to the feature according to the amount of time and resources that are needed for refactoring.
Discovery that a module must be refactored often comes as the result of multiple failed attempts at accomplishing a certain task, resulting in ever-expanding product backlogs, increased effort, and waste. Tracking code metrics eliminates the potential for such setbacks by alerting you that a task might involve completed refactoring before the damage is done. With SPSS, you can generate real, quantitative predictive statistics from the data you gather and create data visualizations to help communicate the status of the project and the resources required for successful, continuous product delivery.
|Excel data for this article||code_metrics_data.zip||32KB|
- Check out the IBM white paper, Business Analytics for Big Data: Unlock Value to Fuel Performance (PDF).
- Read the IBM SPSS white paper, IBM SPSS Analytic Catalyst: Automatically Uncovering Key Insights and More from Big Data (PDF).
- Check out Lean Analytics: Use Data to Build a Better Startup Faster by Alistair Croll and Benjamin Yokovitz (O'Reilly, 2013).
- Check out Dan Woods's post, Why Lean and Agile Go Together (Forbes, 1 Jan 2010).
- Watch this O'Reilly Webcast to learn more about Lean Analytics .
- Find the resources that you need to improve outcomes and control risk in the developerWorks Business analytics zone.
- Read more about Miller's Law.
- Find out more about automated cyclomatic complexity by measuring utilities in Eclipse on IBM developerWorks.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos that range from product installation and setup demos for beginners to advanced functionality for experienced developers.
Get products and technologies
- Learn more about the SPSS family of products.
- Learn more about Rational Team Concert.
- Evaluate IBM products.
- Get involved in the My developerWorks community. Connect with other developerWorks users while you explore the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Big data and analytics on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Crazy about Big data and analytics? Sign up for our monthly newsletter and the latest Big data and analytics news.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.