"Computer, what do I do next?"
As a kid, I wanted to be an astronaut. Life was simple then, but simplicity had consequences: I believe I estimated pi to be “about 3” longer than any post-bronze age culture ever did. In other words, you wouldn’t have wanted me working out the trajectory for the next Mars explorer. But once I got a calculator, I would check my math every time, and my estimated area of circles improved, along with my potential future as an astronaut.
As I’ve gotten older and the problems have gotten harder, my expectation that technology could help me sort through complexity went up as well. I learned to generate and measure data. But deciding what to do next with my information — well, that involved a lot of work. That’s because all the activity of my systems and their interaction with the Internet has meant that lots of numbers were generated, but the complicated task of making sense of it was left entirely to me. It’s one thing to be able to make the right decision; that is, a decision which, in hindsight, would seem to be the correct answer. But I wasn’t even sure I was making a good decision, where I chose the path that aligned with the largest number of data points.
Wouldn’t it be great if my software could at least help me make informed decisions about what to do next?
The power of analytics and optimization
IBM Business Analytics and Optimization software helps you by bringing the power of scientific modeling, statistical analysis, and optimization (along with many other analytic tools) to help solve real world problems. It’s designed to provide insight and meaning to the huge volumes of data, generated not only by what we think of as true “Big Data” activities, but normal, work-a-day problems we all encounter.
Business analytics means that I can analyze a problem to determine the relationships that already exist as part of the data. It means, for example, that I can do an analysis of what a product or service should cost, based on market factors. I can use optimization packages to tell the maximum or minimum amount of something I can produce using certain inputs and constraints. I can finally understand what all that data is trying to tell me.
Big Data is certainly a huge driving force for the need, but the result for us all is the accessibility of these tools. IBM delivers many of these solutions through a cloud delivery approach, which means the footprint for this powerful software in your data center is reduced, without diminishing the value or power of the solutions delivered. This is coupled with run time packaging that enables you to embed the necessary APIs is your system, to make them a part of your system’s capabilities.
For years now, IBM has been at the forefront of analytics in all aspects of its software. You see statistical analytics in our performance evaluation software. You can even use our diagnostic software, such as IBM Support Assistant, to tell you where to look for memory leaks based on the tool’s deep analysis the tool brings, or evaluate the performance statistics from IBM WebSphere® Application Server to help find bottlenecks.
But the offerings in business analytics bring the promise of deep statistical analysis to full fruition. Using the best mathematical algorithms, brought on through years of research, they can help you make decisions that are supported by the data you use to support your business. This quantitative analysis approach brings a unique dimension to decision support that enables making a “good” decision based on statistical modeling methods and best-in-class optimization methods.
Simple example: Capacity planning
Maybe you think business analytics is mainly for finance solutions, or maybe you think your application is too small to apply this kind of approach. But just like any tool, once you see the value and learn to apply its power, you will find all kinds of uses for it – just like a calculator for checking your math – but for decisions derived from your business data.
So then, what kind of things can business analytics bring to my corner of the world?
As a software architect, I have a common problem: capacity planning. Let’s suppose I’m about to deploy a new system. I have a mix of hardware. What’s the relationship of cost to my throughput? And given what the hardware is costing me, what’s the best mix to get the most throughput for the least cost? This simple example will give you the idea of the kind of power behind this software.
I’m pretty sure that faster hardware has a greater cost, but how much? I’ll look at the data to tell me.
Assume I have a spreadsheet of cost and throughput statistics for a set of computers for some imaginary data center. Suppose I’m savvy enough about statistics to know I need a certain sample size to get a meaningful result, and so I have about 30 entries. Let’s also suppose that even though I believe cost and CPU clock speed are correlated, I’m not really interested in that for this simple sample. The data looks something like this:
|Server entry||Clock speed||Throughput||Cost|
This goes on for 30 entries. When finished, it looks like cost is a factor of throughput – but how much?
I’ll crack open my IBM SPSS statistical package, and do a linear regression to see how much CPU clock speed and cost affect my throughput.
Figure 1. SPSS Statistical Data Editor showing linear regression
Figure 2. Detail on SPSS Data Editor showing linear regression selection dialog
Figure 3. SPSS Data Editor showing linear regression output report
From our coefficient of determination (R2) value of .993 (Figure 3) we see there is definitely a relationship between throughput and clock speed and cost (the closer R2 to 1, the stronger the linear relationship).
Even better, the data tells us how that relationship holds up (Figure 4).
Figure 4. SPSS Data Editor showing further down on linear regression output report
The coefficents result tells me that a one dollar change in cost will give me 7.305 additional units of throughput.
You could easily solve this trivial example on a spreadsheet. But what if you had hundreds or thousands of entries, and the relationship wasn’t obvious? Without much work, you could still get the SPSS statistical package to reveal this and many other relationships that are hidden in your data.
Another example: Optimize my cost
Now that I understand how cost figures in, how can I get the best throughput for my servers?
For this example, I’ll use IBM ILOG CPLEX Optimizer to help me decide what mix of computers to use for the best cost.
Suppose my three servers look something like this:
Now, suppose I need at least 30000 throughput units to meet the demands of my application. But I can only get 5 of the “High” throughput servers. What mix of computers should I get to give me the least cost?
For this example, I’ll use the command line interface. From my costs, I know that:
- My high level server (call it x1) costs about $600 and gives me 5000 units of throughput.
- My middle server (x2) costs $300 and gives me 2000 units of throughput.
- My low end (x3) costs $200 and gives me 1000 units of throughput.
- I want to minimize cost, get at least 30000 units of throughput, but can only obtain 5 of the high end (x1) servers.
Here’s how the problem looks:
- Total cost equation: 600x1 + 300x2 + 200x3 = total cost. This is what I want to minimize.
- My decision variables are x1, x2 and x3.
- My constraints are that x1, x2 and x3 must be greater than 0, and 5000 x1 + 2000 x2 + 1000 x3 >= 30000. Also, x1 must be less than or equal to 5.
Using CPLEX optimization , it looks like Listing 1, and Figures 5 and 6.
Enter example Minimize 600 x1 + 300 x2 + 200 x3 Subject to 5000 x1 + 2000 x2 + 1000 x3 >= 30000 bounds X1 <= 5 0 <= x1 0 <= x1 0 <= x2 Integer X1 x2 x3 end
Figure 5. IBM ILOG CPLEX Optimizer showing command line interface
Figure 6. IBM ILOG CPLEX Optimizer command line interface showing optimize command
Then, we say “optimize” (Figure 7).
Figure 7. Results from IBM ILOG CPLEX Optimizer optimize command
The results (Figure 7) show that the optimal cost is $3800.00. So how many servers do I need? Run this command and see the results in Figure 8:
>display solution variables x1-x3
Figure 8. IBM ILOG CPLEX Optimizer “display solution variables ” command
I need to buy 5 high end servers, 2 middle tier, and 1 low end server.
This example shows the optimization using the command line interface. But CPLEX has an extensive API suite and tooling studio that enables you to add the CPLEX capabilities to your system. So as the data changes, you still get the optimized result.
I always use good tools to make my life simpler. With business analytics software, I now have the tools to help me make good decisions on what to do next based on what’s in my data. With cloud delivery for the tools and run time APIs, this power is now more accessible to me than ever. IBM Business Analytics software is the key to unlock what your data has been trying to tell you.
I would like to thank my professors at Carnegie Mellon University Tepper School of Business, including Professor Michael Trick, Senior Associate Dean, Professor Francois Margot, Professor of Operations Research, and Professor Gerard Cornuejols, IBM University Professor of Operations Research, for unlocking the promise of statistical analysis and optimization.
Dig deeper into WebSphere on developerWorks
Experiment with new directions in software development.
Read and subscribe for the best and latest technical info to help you deal with your development challenges.
Software development in the cloud. Register today and get free private projects through 2014.
Evaluate IBM software and solutions, and transform challenges into opportunities.