Business intelligence on the cheap with Apache Hadoop and Dojo, Part 2: Create eye-catching, interactive reports using the Dojo toolkit

Take advantage of Dojo charts

Understanding your business is always important. Your company can be as agile as you want it to be, but if you do not know the right moves to make, you are driving with your eyes closed. Business intelligence solutions can be prohibitively expensive, and they often require you to retrofit your data to work with their systems. Open source technologies make it easier than ever to create your own business intelligence reports. In this article, the second of a two-part series, you will learn how to take business intelligence data created by Apache Hadoop and use it to power eye-catching, interactive reports using the Dojo toolkit.

Share:

Michael Galpin, Software Architect, eBay

Michael Galpin's photoMichael Galpin is an architect at eBay and a frequent contributor to developerWorks. He has spoken at various technical conferences, including JavaOne, EclipseCon, and AjaxWorld. To get a preview of what his next project, follow @michaelg on Twitter.



31 August 2010

Also available in Chinese Russian Japanese

Prerequisites

In this article you will use data created using Apache Hadoop from Part 1 of this series. The focus in this article is on the use of the Dojo toolkit (Version 1.4 in this article). In the example, you will use Google's Ajax APIs to download Dojo to your page (see Resources), so downloading Dojo isn't required. In terms of skills, the main thing you will need is experience with JavaScript. See Resources for links to these tools.

Graphical reports with Dojo

In the first part of this series, you saw how to crunch big data using Apache Hadoop. In the example, you used Hadoop to process Apache web server access logs. You used Hadoop to turn these logs into business intelligence data that would tell you what web browsers the users of your web site were using to access your site. One of the useful things you did with Hadoop was to format this data as JSON because you knew you were eventually going to be writing a web application that would consume this data and turn it into a graphical report. Listing 1 shows some sample data that you will use in this article.

Listing 1. Sample browser usage data
[
    {"month" : "January 2010", "data": 
        {"IE8":5339680,"IPHONE":176397,"SAFARI":1161063,
        "FF35":5334121,"OTHER":1697189,"IE6":2355910,"OPERA":293024,
        "IE7":3448568,"FF3":1425939,"CHROME":1381381}},
    {"month" : "February 2010", "data": 
        {"IE8":4420267,"IPHONE":122378,"SAFARI":937765,
        "FF35":4904831,"OTHER":1249727,"IE6":1824138,"OPERA":261245,
        "IE7":2548741,"FF3":848517,"CHROME":1122684}},
    {"month" : "March 2010", "data" : 
        {"IE8":4832154,"IPHONE":124723,"SAFARI":1004835,
        "FF35":5240639,"OTHER":1443493,"IE6":1782140,"OPERA":288338,
        "IE7":2705560,"FF3":728227,"CHROME":1250771}},
    {"month" : "April 2010", "data" : 
        {"IE8":6014148,"IPHONE":153317,"SAFARI":1184909,
        "FF35":6355369,"IE6":2023596,"OTHER":1701331,"OPERA":336320,
        "IE7":3083772,"FF3":794613,"CHROME":1895022}},
    {"month" :"May 2010", "data" : 
        {"IE8":3985522,"IPHONE":107109,"SAFARI":826693,
        "FF35":4443157,"OTHER":1350928,"IE6":1169420,"OPERA":230201,
        "IE7":2032111,"FF3":471397,"CHROME":1358771}},
    {"month" :"June 2010", "data" : 
        {"IE8":4944664,"IPHONE":143594,"SAFARI":597916,
        "FF35":5396690,"OTHER":1740354,"IE6":1367462,"OPERA":264916,
        "IE7":2318786,"FF3":511660,"CHROME":1594828}}
]

This is data that can be generated by the Hadoop job developed in Part 1 of this series. In this example, you will include the data directly on a web page, but it would be easy to have it in a separate file that could be downloaded using Ajax. As you can see from Listing 1, you have six months of browser stats represented in JSON. This data can be very easily consumed by a web application to create a report.

There are many excellent frameworks and libraries, both server-side and client-side, that you can use to create reports from this data. To make your report highly interactive, you want a client-side solution, and the Dojo toolkit is an excellent fit. It has both two-dimensional and three-dimensional charts. We will stick with the two-dimensional charts for this example. Listing 2 shows how to create the basic chart.

Listing 2. Creating a basic pie chart with Dojo
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
    "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop Reports</title>
<script type="text/javascript"
    src="http://ajax.googleapis.com/ajax/libs/dojo/1.4/dojo/dojo.xd.js"
    djConfig="parseOnLoad: true"></script>
<script type="text/javascript">
    dojo.require("dojox.charting.Chart2D");
    var pieChart = {};
      var stats; // see Listing 1 for the stats
    function init(){
        pieChart = new dojox.charting.Chart2D("pie-chart");
        pieChart.addPlot("default", {
            type: "Pie",
            radius: 300,
            fontColor: "black",
            labelOffset : "-50"
        });
        var lastMonth = stats[stats.length -1];
        pieChart.addSeries("browsers", makePieSeries(lastMonth.data));
        pieChart.render();
    }
    function makePieSeries(data){
        var series = [];
        var total = 0;
        var key = "";
        for (key in data){
            total += data[key];
        }
        var label = "";
        for (key in data){
            label = key;
            label += " : ";
            label += data[key] * 100.0 / total;
            label += "%";
            series.push({y:data[key], text:label});
        }
        return series;            
    }
    dojo.addOnLoad(init);
</script>
</head>
<body>
<div id="pie-chart" style="width: 800px; height: 750px;"></div>
</body>
</html>

The first thing to look at in Listing 2 is the script block. Notice that you use Dojo's package management system (dojo.require) to download Dojo's 2D chart object, dojox.charting.Chart2D. The last line of the script block uses the dojo.addOnLoad function to invoke a function (in this case the init function) when the web page finishes loading. In the init function, you create the pie chart and pass in the string pie-chart to the constructor. This tells Dojo to look for an HTML element with ID pie-chart and use this as the parent element of the chart that will be created. You can see that element in the HTML structure in Listing 2.

Going back to the init function, the next thing you do is call the addPlot method on the chart object that you created. Here you specify the options for the chart, including what kind of chart it will be. Dojo has support for many types of charts. In this case you specify Pie as the type. Many options are specific to the type of chart. For example, here you specify a radius for the pie chart, something that would not apply to other types of charts.

Next you call the addSeries method on the chart object. Here is where you pass in data from Listing 1. However, you need to massage this data slightly to make it work perfectly with Dojo's pie chart, which is what the makePieSeries function does. It takes in browser data and returns an array of simple objects. Each object has two properties: y and text. The y object is the value, and text is a label. Most of the code in the makePieSeries function is for creating a label that will show the name of the browser, how many hits, and the percentage of total hits.

Going back to the init function, the last thing you do is call the render function on the chart. This is the function that actually causes Dojo to draw the chart on the web page. As mentioned, some of the plot and series options are different depending on the type of chart. However, the pattern of addPlot/addSeries/render is common. Figure 1 shows what the chart will look like.

Figure 1. Basic pie chart showing June browser stats
A basic pie chart that shows June browser stats

The chart in Figure 1 is a bare-bones Dojo pie chart. The only thing that is not default about it is that you have custom labels for the slices on the chart. It may not be the most beautiful chart that you have ever seen, but it graphically displays your business intelligence data, and it took very little code to produce. As you are about to see, it is fairly straightforward to make this chart a little more handsome.


Adding eye candy

Dojo makes it easy to create a basic pie chart, like the one you see in Figure 1. This might not be exactly what you want to show to the decision makers who need to see this chart. Fortunately, Dojo makes it easy to add some eye candy. Listing 3 shows some updates to the code from Listing 2 that will produce a more colorful and interactive pie chart.

Listing 3. Colorful pie chart code
dojo.require("dojox.charting.Chart2D");
dojo.require("dojox.charting.themes.Shrooms");
dojo.require("dojox.charting.action2d.MoveSlice");
dojo.require("dojox.charting.action2d.Tooltip");

function init(){
    pieChart = new dojox.charting.Chart2D("pie-chart");
    pieChart.addPlot("default", {
        type: "Pie",
        radius: 300,
        fontColor: "black",
        labelOffset : "-50"
    });
    var lastMonth = stats[stats.length -1];
    pieChart.addSeries("browsers", makePieSeries(lastMonth.data));
    var slice = 
          new dojox.charting.action2d.MoveSlice(pieChart,"default");
    var tip = new dojox.charting.action2d.Tooltip(pieChart,"default",{
        text : function(o) { 
            var run = o.run;
            var item = run.data[o.index];
            var label = item.text;
            var split = label.split(" : ");
            var browser = split[0];
            var percentage = split[1];
            var total = item.y;
            return browser + " : " + total + " (" + percentage + ")"; 
        }
    });
    pieChart.setTheme(dojox.charting.themes.Shrooms);
    pieChart.render();
}

The first thing you should notice in Listing 3 is that several new dojo.require calls have been added. You will see each of these objects referenced later in the code. The first one that is used is the MoveSlice object. This is a simple animation that makes a slice of the pie pop out when it is moused over. The next thing used is the Tooltip object. This allows for a tool tip, which is some extra text to be shown when a slice of the pie is moused over. The default text that will be shown is just the label for the particular slice of the pie. If you want to show something else, as is done here, you supply a function called text. This function produces the text that will be shown as the tool tip. In this case, it will show something like FF35 : 5334121 (27.48%). Finally, the last thing added was a call to your chart's setTheme method, passing in one of the many color themes that Dojo provides. This one provides some vibrant colors. Figure 2 shows the more colorful version of the pie chart.

Figure 2. Technicolor pie chart with flair
Same pie chart as in figure 1, but with color

Now that is more like it. As you can see from Listing 3, this theme is called Shrooms. You can probably guess why. If it's a little too bright for you, Dojo provides more than thirty themes. It is also fairly easy to create your own. Further, you do not have to use themes. You can directly specify the color for each slice of your pie chart. The charts shown in Figure 1 and Figure 2 both only show the June data from Listing 1. Let's take a look at how you can work with the other months of data.


Working with multiple series

You can easily take the code from Listing 3 and use it to render charts for each of the months of data from Listing 1. Each time, you would create a new pie chart, which is a little inefficient. There is a more elegant and efficient way to change the data that backs a chart. Listing 4 shows a modified version of the pie that allows the user to pick which series of data is shown.

Listing 4. Pie chart with data switching controls
function init(){
    // same as in Listing 3
    var chooser = dojo.byId("series-selector");
    var i = 0;
    var monthlyStats = null;
    var opt = null;
    for (i=0;i<stats.length;i++){
        monthlyStats = stats[i];
        opt = dojo.doc.createElement("option");
        opt.value = i;
        opt.appendChild(dojo.doc.createTextNode(monthlyStats.month));
        chooser.appendChild(opt);
    }
}
function aggregateResults(results){
    var aggResults = {};
    aggResults["IE"] = results.IE8 + results.IE7 + results.IE6;
    aggResults["FF"] = results.FF35 + results.FF3;
    aggResults["SAFARI"] = results.SAFARI + results.IPHONE;
    aggResults["CHROME"] = results.CHROME;
    aggResults["OPERA"] = results.OPERA;
    aggResults["OTHER"] = results.OTHER;
    return aggResults;
}
function selectSeries(){
    var selected = dojo.byId("series-selector").value;
    var aggBox = dojo.byId("aggBox").value;
    var series = stats[selected].data;
    if (aggBox){
        series = aggregateResults(series);
    }
    pieChart.updateSeries("browsers", makePieSeries(series));
    pieChart.render();
}
...
<div id="commandBar">
    <label for="series-selector">Choose Data:</label>
    <select name="series-selector" id="series-selector" 
           onchange="selectSeries()">
    </select>
    <label for="aggBox">Aggregate Data?</label>
    <input type="checkbox" id="aggBox" name="aggBox" 
          onchange="selectSeries()"/>
</div>

The code in Listing 4 adds some controls to the chart. First there is a drop-down box that shows each of the data series from Listing 1. This data is dynamically loaded in the init function. There is also a check-box that allows the user to specify if the data should be aggregated, that is, if it has all of the versions of Internet Explorer added together.

When either of these controls is changed, the selectSeries function is called. This function uses the values from the HTML control to determine which data series should be used. It then checks to see if the aggregate check-box is selected, and if so it applies the aggregateResults function to the selected series. Finally it updates the chart by invoking the updateSeries method on the chart and then calling render again. Figure 3 shows the pie chart with the new controls added to it.

Figure 3. Interactive pie chart
Colored pie chart with controls at the bottom for choosing which month's data to show and a check-box for whether to aggregate the data

Now you can switch data series and even apply aggregation to transform the data before it is used to create the pie chart. All of the themes and effects still apply, since it is still the same chart. You are just switching the data being used by the chart. This is a useful chart for seeing what browsers are most important to your site at any given point in time. However if you want to see the trends, such as what browsers are being used more or less on your site, then a pie chart is not useful. Let's take a look at a different kind of chart from Dojo that you can use to see trends.


Trend analysis with multiple series

So far you have only seen pie charts, as they provide a nice visualization for the kind of data you are working with—showing the relative amount of browser share in a given month. To show trending data, that is, data over the course of multiple months, you need a different type of chart. A linear chart would seem like an obvious choice. However, you are going to need to transform your data to make it more usable for this kind of chart. Listing 5 shows the transformation code.

Listing 5. Data transformation code for trend analysis
var xStats = {};
function calcStats(){
    var i = 0;
    var mStats = null;
    var browser = "";
    var total = 0;
    for (i=0;i<stats.length;i++){
        mStats = (stats[i]).data;
        total = 0;
        for (browser in mStats){
            total += mStats[browser];
        }
        for (browser in mStats){
            if (!xStats[browser]){
                xStats[browser] = [];
            }
            xStats[browser].push(mStats[browser] / total);
        }
    }
}
dojo.addOnLoad(calcStats);

The first thing you do in Listing 5 is create a global variable for storing the new data, called xStats. You then define a function called calcStats that performs the necessary transformation. This function iterates over each month of data. First it sums up the total browser hits for that month. Then it iterates over the browsers again, calculating the browser's share for that month, and adds this to xStats. In the end, xStats will be a map, whose keys are browsers and whose values are an array of browser shares, from January to June for that browser. This is exactly the historical data you need for trending. Finally, you make another call to dojo.addOnLoad so that this function will be executed at start-up. Now you just need to turn it into a chart. Listing 6 shows how you can create a linear chart that displays this data.

Listing 6. Creating a linear chart of trending data
dojo.require("dojox.charting.widget.Legend");
function makeTrends(){
    var chart = new dojox.charting.Chart2D("trends");
    chart.addPlot("default", {
        type: "Lines",
        markers : true,
        tension : "S",
        lines : true,
        labelOffset : -30,
        shadows : {dx:2, dy:2, dw:2}
    });
    chart.addAxis("x");
    chart.addAxis("y", {vertical:true});
    var browser = "";
    for (browser in xStats){
        chart.addSeries(browser, makeSeries(xStats[browser]));
    }
    chart.setTheme(dojox.charting.themes.Shrooms);
    chart.render();
    var legend =  new dojox.charting.widget.Legend({chart: chart}, 
          "legend");
}    
function makeSeries(data){
    var series = [];
    var i = 1;
    for (var key in data){
        series.push({x: i++, y :data[key]});
    }
    return series;        
}
dojo.addOnLoad(makeTrends);
<div id="trends" style="width: 800px; height: 800px;"></div>
<div id="legend"></div>

Once again your code starts by adding a new Dojo dependency. This time it is the dojox.charting.widget.Legend object (more on this in a moment). Next you create a function called makeTrends that will create the new chart. This is similar to the code for creating a pie chart. This time you have a different type of Lines. There are a few line chart options that are specific to a linear chart, like markers (show data points), tension (make the lines curvy), and shadows (give each line a drop-shadow). Notice that you also add two axes, creatively called x and y. Next you add a series for each of the browsers you stored in the xStats object. For each browser, you make a call to the makeSeries function. This essentially transforms the data into (x,y) pairs, where x is the month (1=January, 2=February, and so on) and y is the browser's share, as a percentage for that month. Going back to makeTrends, you call the setTheme method, once again going with the Shrooms, and then you call render. Finally, you create a legend using the Legend object mentioned earlier. Figure 4 shows the result.

Figure 4. Data trending chart
A colored line graph showing trending with a legend at the bottom

With this chart, you can see trends like Firefox® 3.5 and Internet Explorer® 8 both growing market share over time, while older versions of Internet Explorer and Firefox are dropping off. There are many other chart types that you can experiment with using this code. For example, the StackedLines type is interesting as well. You can also combine some of the techniques shown earlier, like changing the data series (add an aggregate view, for example.)


Conclusion

This article has shown you the basics of creating interactive charts using the Dojo toolkit. You have seen just two of the many types of charts that Dojo supports. Many of the types have several unique features that can be used to really make them visually appealing. Take a look at the documentation from Dojo to see what options you can use to create the most striking charts for your reporting application.


Download

DescriptionNameSize
Article source codereports.zip3KB

Resources

Learn

Get products and technologies

  • Get Apache Hadoop. This article used version 0.20.
  • Get the Java SDK. JDK 1.6.0_17 was used in this article.
  • Download IBM InfoSphere BigInsights Basic Edition at no charge and build a solution that turns large, complex volumes of data into insight by combining Apache Hadoop with unique technologies and capabilities from IBM.
  • Google Libraries API: Use Google's Ajax APIs to download Dojo to your page.
  • Take a look at the Pig and Hive frameworks that are built on top of Hadoop.
  • Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development, Big data and analytics
ArticleID=513475
ArticleTitle=Business intelligence on the cheap with Apache Hadoop and Dojo, Part 2: Create eye-catching, interactive reports using the Dojo toolkit
publish-date=08312010