In the first part of this series, you saw how to crunch big data using Apache Hadoop. In the example, you used Hadoop to process Apache web server access logs. You used Hadoop to turn these logs into business intelligence data that would tell you what web browsers the users of your web site were using to access your site. One of the useful things you did with Hadoop was to format this data as JSON because you knew you were eventually going to be writing a web application that would consume this data and turn it into a graphical report. Listing 1 shows some sample data that you will use in this article.
Listing 1. Sample browser usage data
[
{"month" : "January 2010", "data":
{"IE8":5339680,"IPHONE":176397,"SAFARI":1161063,
"FF35":5334121,"OTHER":1697189,"IE6":2355910,"OPERA":293024,
"IE7":3448568,"FF3":1425939,"CHROME":1381381}},
{"month" : "February 2010", "data":
{"IE8":4420267,"IPHONE":122378,"SAFARI":937765,
"FF35":4904831,"OTHER":1249727,"IE6":1824138,"OPERA":261245,
"IE7":2548741,"FF3":848517,"CHROME":1122684}},
{"month" : "March 2010", "data" :
{"IE8":4832154,"IPHONE":124723,"SAFARI":1004835,
"FF35":5240639,"OTHER":1443493,"IE6":1782140,"OPERA":288338,
"IE7":2705560,"FF3":728227,"CHROME":1250771}},
{"month" : "April 2010", "data" :
{"IE8":6014148,"IPHONE":153317,"SAFARI":1184909,
"FF35":6355369,"IE6":2023596,"OTHER":1701331,"OPERA":336320,
"IE7":3083772,"FF3":794613,"CHROME":1895022}},
{"month" :"May 2010", "data" :
{"IE8":3985522,"IPHONE":107109,"SAFARI":826693,
"FF35":4443157,"OTHER":1350928,"IE6":1169420,"OPERA":230201,
"IE7":2032111,"FF3":471397,"CHROME":1358771}},
{"month" :"June 2010", "data" :
{"IE8":4944664,"IPHONE":143594,"SAFARI":597916,
"FF35":5396690,"OTHER":1740354,"IE6":1367462,"OPERA":264916,
"IE7":2318786,"FF3":511660,"CHROME":1594828}}
]
|
This is data that can be generated by the Hadoop job developed in Part 1 of this series. In this example, you will include the data directly on a web page, but it would be easy to have it in a separate file that could be downloaded using Ajax. As you can see from Listing 1, you have six months of browser stats represented in JSON. This data can be very easily consumed by a web application to create a report.
There are many excellent frameworks and libraries, both server-side and client-side, that you can use to create reports from this data. To make your report highly interactive, you want a client-side solution, and the Dojo toolkit is an excellent fit. It has both two-dimensional and three-dimensional charts. We will stick with the two-dimensional charts for this example. Listing 2 shows how to create the basic chart.
Listing 2. Creating a basic pie chart with Dojo
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop Reports</title>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.4/dojo/dojo.xd.js"
djConfig="parseOnLoad: true"></script>
<script type="text/javascript">
dojo.require("dojox.charting.Chart2D");
var pieChart = {};
var stats; // see Listing 1 for the stats
function init(){
pieChart = new dojox.charting.Chart2D("pie-chart");
pieChart.addPlot("default", {
type: "Pie",
radius: 300,
fontColor: "black",
labelOffset : "-50"
});
var lastMonth = stats[stats.length -1];
pieChart.addSeries("browsers", makePieSeries(lastMonth.data));
pieChart.render();
}
function makePieSeries(data){
var series = [];
var total = 0;
var key = "";
for (key in data){
total += data[key];
}
var label = "";
for (key in data){
label = key;
label += " : ";
label += data[key] * 100.0 / total;
label += "%";
series.push({y:data[key], text:label});
}
return series;
}
dojo.addOnLoad(init);
</script>
</head>
<body>
<div id="pie-chart" style="width: 800px; height: 750px;"></div>
</body>
</html>
|
The first thing to look at in Listing 2 is the script block. Notice that
you use Dojo's package management system
(dojo.require) to download Dojo's 2D chart
object, dojox.charting.Chart2D. The last line
of the script block uses the dojo.addOnLoad
function to invoke a function (in this case the
init function) when the web page finishes
loading. In the init function, you create the
pie chart and pass in the string pie-chart to
the constructor. This tells Dojo to look for an HTML element with ID
pie-chart and use this as the parent element of the chart that will be
created. You can see that element in the HTML structure in Listing 2.
Going back to the init function, the next thing
you do is call the addPlot method on the
chart object that you created. Here you specify
the options for the chart, including what kind of chart it will be. Dojo
has support for many types of charts. In this case you specify
Pie as the type. Many options are specific to
the type of chart. For example, here you specify a radius for the pie
chart, something that would not apply to other types of charts.
Next you call the addSeries method on the
chart object. Here is where you pass in data
from Listing 1. However, you need to massage this data slightly to make it
work perfectly with Dojo's pie chart, which is what the
makePieSeries function does. It takes in
browser data and returns an array of simple objects. Each object has two
properties: y and
text. The y object
is the value, and text is a label. Most of the code in the
makePieSeries function is for creating a label
that will show the name of the browser, how many hits, and the percentage
of total hits.
Going back to the init function, the last thing
you do is call the render function on the
chart. This is the function that actually causes Dojo to draw the chart on
the web page. As mentioned, some of the plot and series options are
different depending on the type of chart. However, the pattern of
addPlot/addSeries/render is common. Figure 1 shows what the chart will look like.
Figure 1. Basic pie chart showing June browser stats
The chart in Figure 1 is a bare-bones Dojo pie chart. The only thing that is not default about it is that you have custom labels for the slices on the chart. It may not be the most beautiful chart that you have ever seen, but it graphically displays your business intelligence data, and it took very little code to produce. As you are about to see, it is fairly straightforward to make this chart a little more handsome.
Dojo makes it easy to create a basic pie chart, like the one you see in Figure 1. This might not be exactly what you want to show to the decision makers who need to see this chart. Fortunately, Dojo makes it easy to add some eye candy. Listing 3 shows some updates to the code from Listing 2 that will produce a more colorful and interactive pie chart.
Listing 3. Colorful pie chart code
dojo.require("dojox.charting.Chart2D");
dojo.require("dojox.charting.themes.Shrooms");
dojo.require("dojox.charting.action2d.MoveSlice");
dojo.require("dojox.charting.action2d.Tooltip");
function init(){
pieChart = new dojox.charting.Chart2D("pie-chart");
pieChart.addPlot("default", {
type: "Pie",
radius: 300,
fontColor: "black",
labelOffset : "-50"
});
var lastMonth = stats[stats.length -1];
pieChart.addSeries("browsers", makePieSeries(lastMonth.data));
var slice =
new dojox.charting.action2d.MoveSlice(pieChart,"default");
var tip = new dojox.charting.action2d.Tooltip(pieChart,"default",{
text : function(o) {
var run = o.run;
var item = run.data[o.index];
var label = item.text;
var split = label.split(" : ");
var browser = split[0];
var percentage = split[1];
var total = item.y;
return browser + " : " + total + " (" + percentage + ")";
}
});
pieChart.setTheme(dojox.charting.themes.Shrooms);
pieChart.render();
}
|
The first thing you should notice in Listing 3 is that several new
dojo.require calls have been added. You will
see each of these objects referenced later in the code. The first one that
is used is the MoveSlice object. This is a simple
animation that makes a slice of the pie pop out when it is moused
over. The next thing used is the Tooltip
object. This allows for a tool tip, which is some extra text to be shown
when a slice of the pie is moused over. The default text that will be
shown is just the label for the particular slice of the pie. If you want
to show something else, as is done here, you supply a function called
text. This function produces the text that will
be shown as the tool tip. In this case, it will show something like
FF35 : 5334121 (27.48%). Finally, the last
thing added was a call to your chart's setTheme
method, passing in one of the many color themes that Dojo provides. This
one provides some vibrant colors. Figure 2 shows the
more colorful version of the pie chart.
Figure 2. Technicolor pie chart with flair
Now that is more like it. As you can see from Listing 3, this theme is called Shrooms. You can probably guess why. If it's a little too bright for you, Dojo provides more than thirty themes. It is also fairly easy to create your own. Further, you do not have to use themes. You can directly specify the color for each slice of your pie chart. The charts shown in Figure 1 and Figure 2 both only show the June data from Listing 1. Let's take a look at how you can work with the other months of data.
You can easily take the code from Listing 3 and use it to render charts for each of the months of data from Listing 1. Each time, you would create a new pie chart, which is a little inefficient. There is a more elegant and efficient way to change the data that backs a chart. Listing 4 shows a modified version of the pie that allows the user to pick which series of data is shown.
Listing 4. Pie chart with data switching controls
function init(){
// same as in Listing 3
var chooser = dojo.byId("series-selector");
var i = 0;
var monthlyStats = null;
var opt = null;
for (i=0;i<stats.length;i++){
monthlyStats = stats[i];
opt = dojo.doc.createElement("option");
opt.value = i;
opt.appendChild(dojo.doc.createTextNode(monthlyStats.month));
chooser.appendChild(opt);
}
}
function aggregateResults(results){
var aggResults = {};
aggResults["IE"] = results.IE8 + results.IE7 + results.IE6;
aggResults["FF"] = results.FF35 + results.FF3;
aggResults["SAFARI"] = results.SAFARI + results.IPHONE;
aggResults["CHROME"] = results.CHROME;
aggResults["OPERA"] = results.OPERA;
aggResults["OTHER"] = results.OTHER;
return aggResults;
}
function selectSeries(){
var selected = dojo.byId("series-selector").value;
var aggBox = dojo.byId("aggBox").value;
var series = stats[selected].data;
if (aggBox){
series = aggregateResults(series);
}
pieChart.updateSeries("browsers", makePieSeries(series));
pieChart.render();
}
...
<div id="commandBar">
<label for="series-selector">Choose Data:</label>
<select name="series-selector" id="series-selector"
onchange="selectSeries()">
</select>
<label for="aggBox">Aggregate Data?</label>
<input type="checkbox" id="aggBox" name="aggBox"
onchange="selectSeries()"/>
</div>
|
The code in Listing 4 adds some controls to the chart. First there is a
drop-down box that shows each of the data series from Listing 1. This data
is dynamically loaded in the init function.
There is also a check-box that allows the user to specify if the data
should be aggregated, that is, if it has all of the versions of Internet
Explorer added together.
When either of these controls is changed, the
selectSeries function is called. This function
uses the values from the HTML control to determine which data series
should be used. It then checks to see if the aggregate check-box is
selected, and if so it applies the
aggregateResults function to the selected
series. Finally it updates the chart by invoking the
updateSeries method on the chart and then
calling render again. Figure 3 shows the pie chart with
the new controls added to it.
Figure 3. Interactive pie chart
Now you can switch data series and even apply aggregation to transform the data before it is used to create the pie chart. All of the themes and effects still apply, since it is still the same chart. You are just switching the data being used by the chart. This is a useful chart for seeing what browsers are most important to your site at any given point in time. However if you want to see the trends, such as what browsers are being used more or less on your site, then a pie chart is not useful. Let's take a look at a different kind of chart from Dojo that you can use to see trends.
Trend analysis with multiple series
So far you have only seen pie charts, as they provide a nice visualization for the kind of data you are working with—showing the relative amount of browser share in a given month. To show trending data, that is, data over the course of multiple months, you need a different type of chart. A linear chart would seem like an obvious choice. However, you are going to need to transform your data to make it more usable for this kind of chart. Listing 5 shows the transformation code.
Listing 5. Data transformation code for trend analysis
var xStats = {};
function calcStats(){
var i = 0;
var mStats = null;
var browser = "";
var total = 0;
for (i=0;i<stats.length;i++){
mStats = (stats[i]).data;
total = 0;
for (browser in mStats){
total += mStats[browser];
}
for (browser in mStats){
if (!xStats[browser]){
xStats[browser] = [];
}
xStats[browser].push(mStats[browser] / total);
}
}
}
dojo.addOnLoad(calcStats);
|
The first thing you do in Listing 5 is create a global variable for storing
the new data, called xStats. You then define a
function called calcStats that performs the
necessary transformation. This function iterates over each month of data.
First it sums up the total browser hits for that month. Then it iterates
over the browsers again, calculating the browser's share for that month,
and adds this to xStats. In the end,
xStats will be a map, whose keys are browsers
and whose values are an array of browser shares, from January to June for
that browser. This is exactly the historical data you need for trending.
Finally, you make another call to
dojo.addOnLoad so that this function will be
executed at start-up. Now you just need to turn it into a chart. Listing 6 shows how you can create a linear chart that
displays this data.
Listing 6. Creating a linear chart of trending data
dojo.require("dojox.charting.widget.Legend");
function makeTrends(){
var chart = new dojox.charting.Chart2D("trends");
chart.addPlot("default", {
type: "Lines",
markers : true,
tension : "S",
lines : true,
labelOffset : -30,
shadows : {dx:2, dy:2, dw:2}
});
chart.addAxis("x");
chart.addAxis("y", {vertical:true});
var browser = "";
for (browser in xStats){
chart.addSeries(browser, makeSeries(xStats[browser]));
}
chart.setTheme(dojox.charting.themes.Shrooms);
chart.render();
var legend = new dojox.charting.widget.Legend({chart: chart},
"legend");
}
function makeSeries(data){
var series = [];
var i = 1;
for (var key in data){
series.push({x: i++, y :data[key]});
}
return series;
}
dojo.addOnLoad(makeTrends);
<div id="trends" style="width: 800px; height: 800px;"></div>
<div id="legend"></div>
|
Once again your code starts by adding a new Dojo dependency. This time it
is the dojox.charting.widget.Legend object (more on this in a moment). Next you create a function called
makeTrends that will create the new chart. This
is similar to the code for creating a pie chart. This time you have
a different type of Lines. There are a few line
chart options that are specific to a linear chart, like markers (show data
points), tension (make the lines curvy), and shadows (give each line a
drop-shadow). Notice that you also add two axes, creatively called x and
y. Next you add a series for each of the browsers you stored in the
xStats object. For each browser, you make a
call to the makeSeries function. This
essentially transforms the data into (x,y) pairs, where x is the month
(1=January, 2=February, and so on) and y is the browser's share, as a
percentage for that month. Going back to
makeTrends, you call the
setTheme method, once again going with the
Shrooms, and then you call render. Finally, you
create a legend using the Legend object
mentioned earlier. Figure 4 shows the result.
Figure 4. Data trending chart
With this chart, you can see trends like Firefox® 3.5 and Internet
Explorer®
8 both growing market share over time, while older versions of Internet
Explorer and Firefox are dropping off. There are many other chart types
that you can experiment with using this code. For example, the
StackedLines type is interesting as well. You can also combine some of the techniques shown earlier, like
changing the data series (add an aggregate view, for example.)
This article has shown you the basics of creating interactive charts using the Dojo toolkit. You have seen just two of the many types of charts that Dojo supports. Many of the types have several unique features that can be used to really make them visually appealing. Take a look at the documentation from Dojo to see what options you can use to create the most striking charts for your reporting application.
| Description | Name | Size | Download method |
|---|---|---|---|
| Article source code | reports.zip | 3KB | HTTP |
Information about download methods
Learn
-
"Writing a custom Dojo application" (Wendi Nusbickel and Melissa
Betancourt, developerWorks, December 2008): Find out much more about Dojo.
-
"Develop HTML widgets with Dojo" (Igor Kusakov, developerWorks,
October 2006): Explore Dojo's extensibility.
-
"Distributed
data processing with Hadoop, Part 1" and Part
2
(M. Tim Jones, developerWorks, May 2010): Check out these articles for a
more detailed exploration of Hadoop.
- "Deriving new business insights with Big Data" (Stephen Watt,
developerWorks, June 2010): See how Hadoop can provide insights into your
business.
-
"Distributed computing with Linux and Hadoop" (Ken Mann and M. Tim
Jones, developerWorks, December 2008): Learn more about the inner workings
of Hadoop.
- The developerWorks Web development zone
specializes in articles covering various web-based solutions.
- IBM InfoSphere BigInsights Basic Edition -- IBM's Hadoop distribution -- is an integrated, tested and pre-configured, no-charge download for anyone who wants to experiment with and learn about Hadoop.
- Find free courses on Hadoop fundamentals, stream computing, text analytics, and more at Big Data University.
Get products and technologies
- Get Apache Hadoop. This article used
version 0.20.
- Get the Java SDK.
JDK 1.6.0_17 was used in this article.
- Download IBM InfoSphere BigInsights Basic Edition at no charge and build a solution that turns large, complex volumes of data into insight by combining Apache Hadoop with unique technologies and capabilities from IBM.
-
Google Libraries API:
Use Google's Ajax APIs to download Dojo to your page.
- Take a look at the
Pig and Hive frameworks that are
built on top of Hadoop.
- Download IBM product
evaluation versions, and get your hands on application development
tools and middleware products from DB2®, Lotus®, Rational®,
Tivoli®, and
WebSphere®.
Discuss
- Create your My developerWorks profile today and set up a watchlist on Hadoop.
Get connected and stay connected with My developerWorks.
- Find other developerWorks members interested in web development.
- Web developers, share your experience and knowledge in the web development group.
- Share what you know: Join one of our developerWorks groups focused on web topics.
- Roland Barcia talks about Web 2.0 and middleware in his blog.
- Follow developerWorks' members' shared bookmarks on web topics.
- Get answers quickly: Visit the Web 2.0 Apps forum.

Michael Galpin is an architect at eBay and a frequent contributor to developerWorks. He has spoken at various technical conferences, including JavaOne, EclipseCon, and AjaxWorld. To get a preview of what his next project, follow @michaelg on Twitter.




