In this article, we illustrate how to use three of the Google Code APIs by coding an interesting baseball hack. The format of this article and the application domain of our code are inspired by Joseph Adler's excellent book, Baseball Hacks (see Resources). We use three Google Code APIs: the Google Chart API, the Google Spreadsheets Data API, and the Google Gadgets API. We won't cover the APIs comprehensively. We hope to show enough to pique your interest. You can consult the Google documentation and tutorials for more details. Our goal is to build a Google Gadget that displays up-to-date batting statistics of your favorite Major League Baseball team. We do this in three steps.
First, we download current baseball statistics from a baseball Web site and store them in comma-delimited text files. Second, we upload these stats to a Google Spreadsheet using the Google Spreadsheets Data API. Third, we write a Google Gadget that reads the stats from the spreadsheet and displays them as a bar chart using the Google Chart API.
Scraping statistics from the Web
The first step is to download the stats. Our code for this was
originally based on "Hack #25: Spider Baseball Sites for Data" in Baseball
Hacks. We won't go into any details because it's explained well in the book.
(Our code is in the get_all_teams_hitting.pl Perl program
and GetStats.pm Perl module included in the sample code
that you can download for this article.) A few points are worth mentioning,
though.
First, while there are plenty of RSS and Atom feeds for the results of games (box scores, that is), there do not seem to be any RSS or Atom feeds for up-to-date player stats. If there were, obtaining the stats from such Web feeds would be the way to go. So, instead, we were required to scrape HTML pages for our data. Having to do this is never pretty and is fragile. If the format of the HTML is suddenly changed, your code could easily be broken. The second point is that you shouldn't scrape data from a site without first checking whether it's okay with the site owner. Check out the "Terms of Use" of the site. Third, if it is okay to scrape data from the site, don't be greedy. Put a pause of a second or two between successive page hits. Not only is this the polite thing to do, but, if you don't, you may find yourself locked out of the site. (This happened to us when testing some code that didn't have pauses in it.)
Programmatically uploading statistics to a Google Spreadsheet
After we've scraped HTML pages, downloaded the stats, and stored them in comma-delimited text files, one for each team, we want to store them in a Google Spreadsheet that will be accessed later by our Google Gadget. The Google Spreadsheets Data API lets you add to, change, and delete the contents of a Google Spreadsheet either at the row level or the cell level. The Google Spreadsheet Data API is available for Java™, .NET, PHP, and Python. We used Python because we happen to like Python.
In Listing 1, our Python code loops
over the set of comma-delimited text files and then calls the code that actually
uploads the stats to the spreadsheet. (We also use another data file, MLBTeams.csv, that contains a list of the teams.)
Listing 1. Partial listing of upload_all_teams_hitting.py
mlb_teams = open('../../data/MLBTeams.csv')
try:
lines = mlb_teams.readlines( );
finally:
mlb_teams.close( )
# Skip header.
for line in lines[1:]:
line = line.rstrip();
if line != '':
parts = line.split(",")
curr_wksht_id = parts[1].rstrip()
teamAbbrev = parts[0].rstrip()
print "Updating: " + teamAbbrev
infile = "../../tempData/" + teamAbbrev + "Hitting.txt"
uploader = Uploader(user, pw, infile)
if curr_key != '':
uploader.set_spreadsheet(curr_key)
if curr_wksht_id != '':
uploader.set_worksheet(curr_wksht_id)
uploader.prompt_delete_old_rows_and_add_new_rows() |
You can programmatically interact with Google Spreadsheets in one of two ways: a row-based feed or a cell-based feed. In this application, since we're usually just deleting the rows of old data and creating rows with the new data, we mainly use a row-based feed.
Here are the gdata.spreadsheet
methods and fields we use:
-
SpreadsheetsListFeed: Lets you work with spreadsheets at the row level -
SpreadsheetsList: A single row in a worksheet -
SpreadsheetsCellsFeed: Lets you work with spreadsheets at the cell level-
row_count.text: For getting the number of rows in a worksheet
-
-
service.SpreadsheetsService: The client API object for accessing the spreadsheet-
email: A Gmail e-mail address -
password: Gmail password -
source: The name of the application accessing the API -
ProgrammaticLogin(): Logs you into your Google account -
GetCellsFeed(key, wksht_id): Gives you a cell feed (SpreadsheetsCellsFeed) of a spreadsheet. You must specify the ID of the spreadsheet and the ID of the worksheet within the spreadsheet. -
GetListFeed(key, wksht_id): Gives you a list feed (SpreadsheetsListFeed) of a spreadsheet. You must specify the ID of the spreadsheet and the ID of the worksheet within the spreadsheet. -
InsertRow(row_data, key, wksht_id): Adds a new row to the spreadsheet -
DeleteRow(entry): Deletes aSpreadsheetsList(which is a row)
-
Listing 2 contains the code that uploads our
batting stats to a Google Spreadsheet. First, we delete all the rows in the current
worksheet (in _delete_and_add_rows), and then we add new
rows with data from a .csv file (in _fill_in_rows).
Listing 2. Partial listing of upload_stats.py
import gdata.spreadsheet.service
class Uploader(CRUD):
"""
Deletes old contents of a Google Worksheet and adds new contents from a .csv
file.
"""
def __init__(self, email, password, infile):
"Initialize attributes of an Uploader."
self.gd_client = gdata.spreadsheet.service.SpreadsheetsService()
self.gd_client.email = email
self.gd_client.password = password
self.gd_client.source = 'Upload Batting Stats'
self.gd_client.ProgrammaticLogin()
self.curr_key = ''
self.curr_wksht_id = ''
self.list_feed = None
self.infile = infile
def prompt_delete_old_rows_and_add_new_rows(self):
"""
Delete old rows and add new ones, but prompt user for spreadsheet and
worksheet IDs if they're not already set.
"""
if not self.curr_key:
self._PromptForSpreadsheet()
if not self.curr_wksht_id:
self._PromptForWorksheet()
self._delete_and_add_rows()
def _delete_and_add_rows(self):
"""
First, delete all the rows (except the header row) in the spreadsheet
and then call method to add rows.
"""
cells_feed = self.gd_client.GetCellsFeed(self.curr_key, self.curr_wksht_id)
row_count = int(cells_feed.row_count.text)
while (row_count > 1):
self._list_delete_action(0)
cells_feed = self.gd_client.GetCellsFeed(self.curr_key, self.curr_wksht_id)
row_count = int(cells_feed.row_count.text)
self._fill_in_rows()
def _list_delete_action(self, index):
"Delete the row with the given index."
self.list_feed = self.gd_client.GetListFeed(self.curr_key, self.curr_wksht_id)
self.gd_client.DeleteRow(self.list_feed.entry[index])
print 'Deleted!'
def _fill_in_rows(self):
"Create new rows with the new data."
infile = open(self.infile)
header_line = infile.readline().strip()
headers = header_line.split(',')
col_count = len(headers)
while infile:
line = infile.readline().strip()
if len(line) > 0:
row_data = {}
parts = line.split(',')
for col in range(col_count):
row_data[headers[col].lower()] = parts[col].strip()
try:
self._list_insert_action(row_data)
except (gdata.service.RequestError, SyntaxError):
print 'Error inserting row:', row_data
print 'Will skip that row.'
else:
break;
def _list_insert_action(self, row_data):
"Insert a new row with data."
entry = self.gd_client.InsertRow(row_data,
self.curr_key, self.curr_wksht_id)
if isinstance(entry, gdata.spreadsheet.SpreadsheetsList):
print 'Inserted!' |
Next, we will display this data in a Google bar chart.
Creating a Google bar chart with the Google Chart API
The Google Chart API is easy to use and is accessed via a particular google.com URL. You specify the required arguments and the Google Chart API returns a custom chart.
Let's look at an example. Figure 1 shows what we want our batting stats chart to look like. The chart shows batting average (AVG), on-base percentage (OBP), and slugging percentage (SLG).
Figure 1. Example Google bar chart
Take a quick look at our example chart-generating URL in Listing 3, and then we'll break it down and explain its component parts.
Listing 3. Example URL for Google Chart API
var chartURL = "http://chart.apis.google.com/chart?cht=bvg&" +
"chd=t:0.284,0.299,0.271|0.328,0.37,0.298|0.355,0.494,0.4&chs=298x180&" +
"chco=c6d9fd,4d89f9,8a31fb&chdl=AVG|OBP|SLG&chds=0,0.700&" +
"chtt=Minnesota%20Twins+Batting&chxt=y,x&" +
"chxl=0:|0.000|0.100|0.200|0.300|0.400|0.500|0.600|0.700|" +
"1:|D+Young|J+Morneau|C+Gomez&chbh=15"
|
Here are what the different arguments specify:
-
cht: The chart type.bvgmeans we want a bar chart that is vertically-oriented and the bars are grouped (as opposed to stacked). -
chd: The chart data -
chs: The chart size (in pixels) -
chco: The colors of the bars -
chdl: The chart data labels -
chds: The chart scaling parameter -
chtt: The chart title -
chxt: The orientation of multiple axes -
chxl: The chart labels -
chbh: The height of the bars (Bars are, by default, horizontal, but ours are vertical. So, in our case, this is actually the width of the bars.)
Finally, we create our Google Gadget. Google Gadgets are widgets that you can place within Web pages, such as your iGoogle page. They're made up of a combination of XML, HTML, and JavaScript.
Our Google Gadget contains our chart and is shown in Figure 2.
Figure 2. Batting stats Google Gadget
As you can see, Google Gadgets allow you to prompt for user preferences. In ours, we let the users select the team and how many players they want to display.
Figure 3. Gadget with preferences shown
Listing 4 shows the code for our gadget. The code basically does three things:
- Gets the batting stats from the Google Spreadsheet
- Constructs the URL for the Google Chart
- Adds the chart to the HTML in the gadget
We retrieve a JSON (JavaScript Object Notation) feed using the Spreadsheets
Data API to get the data from the spreadsheet. This JSON feed is handed back to our
listStats call-back function, and we parse the
spreadsheet data within that function.
Listing 4. Partial listing of mlb-batting-stats.xml
<Module>
<ModulePrefs
title="MLB Batting Stats"
screenshot="http://mrsabermetrics.sourceforge.net/images/MLBBattingScreenShot.PNG"
description="Displays up-to-date MLB batting stats."
height="200"
width="675"
thumbnail="http://mrsabermetrics.sourceforge.net/images/Babe_Ruth.jpg"
title_url="http://mrsabermetrics.sourceforge.net/"
scrolling="true"
>
</ModulePrefs>
<UserPref name="num_players" display_name="How many players to show?"
datatype="enum" default_value="9" required="true">
<EnumValue value="2" display_value="2"/>
<EnumValue value="3" display_value="3"/>
<EnumValue value="4" display_value="4"/>
.
.
.
<EnumValue value="15" display_value="15"/>
</UserPref>
<UserPref name="team"
display_name="Team"
datatype="enum"
required="true"
>
<EnumValue value="ARI" display_value="Arizona Diamondbacks"/>
<EnumValue value="ATL" display_value="Atlanta Braves"/>
<EnumValue value="BAL" display_value="Baltimore Orioles"/>
.
.
.
<EnumValue value="MON" display_value="Washington Nationals"/>
</UserPref>
<Content type="html">
<![CDATA[
<div id="statsdiv"></div>
<script>
function listStats(root) {
var feed = root.feed;
var entries = feed.entry || [];
var html = [''];
var avgs = [];
var obps = [];
var slgs = [];
var names = [];
// Construct the URL for the Google Chart.
var cht = 'bvg';
html.push(
'<img src="http://chart.apis.google.com/chart?cht=' + cht + '&chd=t:');
for (var i = 0; i < feed.entry.length; ++i) {
var entry = feed.entry[i];
var title = entry.title.$t;
names.push(title);
var content = entry.content.$t;
var parts = content.split(',');
var avg = extractStat(parts[0]);
avgs.push(avg);
var obp = extractStat(parts[1]);
obps.push(obp);
var slg = extractStat(parts[2]);
slgs.push(slg);
}
var numPlayersPref = getStringPref("num_players");
// Show highest AB players
var numPlayers = parseInt(numPlayersPref);
if (numPlayers > names.length) {
numPlayers = names.length;
}
for (var i = 0; i < numPlayers; ++i) {
html.push(avgs[i]);
if (i < numPlayers - 1) {
html.push(",");
}
}
html.push("|");
for (var i = 0; i < numPlayers; ++i) {
html.push(obps[i]);
if (i < numPlayers - 1) {
html.push(",");
}
}
html.push("|");
for (var i = 0; i < numPlayers; ++i) {
html.push(slgs[i]);
if (i < numPlayers - 1) {
html.push(",");
}
}
var width = 1000;
if (numPlayers <= 9) {
var width = 58 * numPlayers + 124;
}
var chs = width + 'x180';
var chds = '0,0.700';
html.push(
"&chs=" + chs +
"&chco=c6d9fd,4d89f9,8a31fb&chdl=AVG|OBP|SLG&chds=" + chds);
var teamAbbrevToName = new Object();
for (var i = 0; i < teamInfoArray.length; i += 3) {
teamAbbrevToName[teamInfoArray[i]] = teamInfoArray[i + 2];
}
var teamName = teamAbbrevToName[getStringPref("team")];
var chxt = 'y,x';
html.push('&chtt=' + teamName + '+Batting&chxt=' + chxt +
'&chxl=0:|0.000|0.100|0.200|0.300|0.400|0.500|0.600|0.700|1:');
for (var i = 0; i < numPlayers; ++i) {
var parts = names[i].split(' ');
name = parts.join("+");
html.push('|' + name)
}
html.push('&chbh=15"/>');
var generatedHTML = html.join("");
// Add the chart to the HTML in the gadget.
document.getElementById("statsdiv").innerHTML = generatedHTML;
}
function extractStat(nameAndStat) {
var parts = nameAndStat.split(":");
var stat = parts[1];
var fStat = parseFloat(stat);
return fStat;
}
function getStringPref(prefName) {
var prefs = new _IG_Prefs();
var stringPref = prefs.getString(prefName);
return stringPref;
}
var teamInfoArray =
new Array('MIN', 'od6', "Minnesota Twins", 'ANA', 'od7', "Los Angeles Angels",
'ARI', 'od4', "Arizona Diamondbacks", 'ATL', 'od5', "Atlanta Braves",
'BAL', 'oda', "Baltimore Orioles", 'BOS', 'odb', "Boston Red Sox",
.
.
.
'TEX', 'ocm', "Texas Rangers", 'TOR', 'ocn', "Toronto Blue Jays");
</script>
<script language="javascript">
var teamToWkshtID = new Object( );
for (var i = 0; i < teamInfoArray.length; i += 3) {
teamToWkshtID[teamInfoArray[i]] = teamInfoArray[i + 1];
}
var wkshtID = teamToWkshtID[getStringPref("team")];
// Get the batting stats from the Google spreadsheet.
// Retrieve a JSON feed using the Spreadsheets Data API
var parmsrc =
"http://spreadsheets.google.com/feeds/list/" +
"o15075496074116042190.8006901556347913051/" + wkshtID +
"/public/basic?alt=json-in-script&callback=listStats";
scriptNode = document.createElement('script');
scriptNode.src = parmsrc;
scriptNode.type = 'text/javascript';
document.getElementsByTagName('head')[0].appendChild(scriptNode);
</script>
]]>
</Content>
</Module>
|
Some improvements could be made to this gadget. Here are a few:
- It would be nice to be able to see the exact numbers represented by the bars. For instance, when your mouse pointer hovers over a bar, the exact stat could appear (like a tool tip does).
- It would be nice to have a feature that lets you compare team AVG/OBP/SLG stats.
- There are lists of team information in the Gadget code which we might want to keep as data in a text file somewhere, rather than having it in the code itself. On the other hand, this data isn't likely to change often (although the Devil Rays did shorten their name to the "Rays"), so keeping this data in the code isn't the end of the world.
You could argue that we should obtain the stats within the gadget code directly from a baseball Web site without going through the intermediate step of using a spreadsheet. We used a spreadsheet as an intermediate step for a couple of reasons. First, these stats are only going to change once a day and they're the same for every user. So, there's no point in needlessly pounding the baseball site that we're scraping. Second, the main point of this article is to show you what's available in the Google Code APIs, and this is a nice example of the Google Spreadsheet API.
There is really no limit to the number of baseball Google Gadgets you could come up with. It would be fun to create a whole 'dashboard' of gadgets that would help you to manage fantasy baseball teams.
Google Gadgets are not the only type of widgets out there. Mac OS X Dashboard Widgets are another popular type of widget.
In this article, we've given you some idea of the type of Web applications you can build using Google Gadgets, the Google Spreadsheet API, and the Google Chart API. With our code samples, we've also given you enough to get started writing your own applications using these Google Code APIs. In the Resources section, you'll find links to more detailed information. We also encourage you to take a look at Joseph Adler's excellent Baseball Hacks book. Even if you think you don't like baseball, you'll find this book interesting.
- Code sample: Source code for this article
- Demo: MLB Batting Stats Google Gadget1
Note
- The Google Gadget built in this article.
Learn
-
Baseball Hacks by
Joseph Adler (Copyright 2006 O'Reilly Media, Inc., 0-596-00942-9) is excellent and
was the main inspiration for this article.
-
Moneyball: The Art
of Winning an Unfair Game by Michael M. Lewis (Copyright 2004 W. W.
Norton & Company, 978-0393324815) is about the objective use of statistics in
baseball management. Even if you're not that interested in baseball, you'll still
find this book fascinating.
- Download the development kits and learn
more about:
- Browse the technology
bookstore for books on these and other technical topics.
Discuss
- Check out developerWorks blogs and get
involved in the developerWorks
community.

Paul Reiners was an Oakland Athletics fan back in the days of Catfish Hunter, Rollie Fingers, and Vida Blue. Paul is the developer of several open source programs, including Automatous Monk, Twisted Life, and Leipzig. He received his M.S. in Applied Mathematics (Theory of Computation) at the University of Illinois at Urbana-Champaign. He lives in Minnesota and, in his spare time, practices the electric bass and competes at TopCoder.
Comments (Undergoing maintenance)







