Data.gov for government agencies

Learn about the U.S. Federal Government's tool for open data

Because people are more aware of the value of open data, entire new economies have sprung up around its use and management. Sophisticated taxpayers often demand open access to public data from governments, and even less informed users want to know how this type of information enhances data services such as online maps and charts. In 2009, the U.S. Federal Government launched Data.gov, a site to aggregate feeds of government data. Pressure on agencies to publish information at Data.gov has been steady. The Open Government Directive of 2009 requires all Federal agencies to post at least three high-value data sets online and register them on Data.gov. In this article, learn about Data.gov, the basic information your agency needs to know to participate in this revolution in government, and ideas for doing so efficiently.

Uche Ogbuji, Partner, Zepheira, LLC

Photo of Uche OgbujiUche Ogbuji is partner at Zepheira where he oversees creation of sophisticated web catalogs and other richly contextual databases. He has a long history of pioneering in advanced web technologies such as XML, semantic web and web services, open source projects such as Akara, an open source platform for web data applications. He is a computer engineer and writer born in Nigeria, living and working near Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his weblog, Copia.



28 February 2012

Also available in Russian

Overview

In 2009 the United States, under Federal Chief Information Officer (CIO) Vivek Kundra, launched an ambitious website and service. Data.gov serves as a repository for information collected and managed by the federal government, and is available for use by the public.

Leaders and developers in technology have long called for a culture of open data, meaning transparency and portability of data generated and used by institutions. The Internet has transformed every sphere of society largely through its foundation on the free movement of information almost regardless of traditional borders, interests, and practical barriers. Most proponents of data transparency accept that barriers should always remain for reasons of privacy and security, but they argue for as much availability and interchange of information as possible. They claim information flow is a powerful engine for generating new business and public good in the knowledge economy.

You can imagine that such an argument would garner the attention of U.S. government, which looks to nurture business and increase public good. In addition, the data in question belongs to the taxpayer, who funds the agencies that control the data. This is an extension of the open data argument in business where customers demand access to data relevant within his or her account information. The launching of Data.gov announced the recognition of these facts by one of the largest organizations in the world, and opened up some exciting possibilities for businesses, media, and concerned citizens in general.


Past and future

Data.gov is not the first open data initiative in the U.S. In 2000, the National Institutes of Health (NIH), working with the Food and Drug Administration (FDA), launched ClinicalTrials.gov, a site that made information pertaining to the public clinical trials that are part of the regulatory process for any medical therapy. NIH initially provided ClinicalTrials.gov as a seed site that then grew in scope under expanding regulatory guidance of the FDA. The site has now become a rich trove of information related to the development of drugs, whether privately or publicly funded. Another site, Science.gov, has made U.S. government-sponsored scientific information and research results available since 2002.

With the growing importance of such pioneering sites, there was a lot of discussion of open data in governments worldwide in the late 2000 decade, particularly in the U.S. and in the United Kingdom, where they launched Data.gov.uk shortly after its U.S. counterpart. An important catalyst in the U.S. was the Open Government Initiative (OGI), put in place by President Barack Obama on his first day in office, January 20, 2009.

The Office of Management and Budget (OMB) prepared and released a Concept of Operations (CONOPS) document to give shape to Data.gov, and continues to evolve this document as a blueprint for the site. Pursuant to the OGI, the OMB also, in December, 2009, released a memo entitled "Open Government Directive" to Federal agencies. The memo articulated a strong default position of openness with data that agencies should adopt, mentioning, for example, Attorney General Eric Holder's new guidelines that openness is the Federal Government's default position for matters relating to the Freedom of Information Act (FOIA). The memo included the following instruction:

Within 45 days, each agency shall identify and publish online in an open format at least three high-value data sets (see attachment section 3.a.i) and register those data sets via Data.gov. These must be data sets not previously available online or in a downloadable format.

With this simple stroke, the OMB mandated the growth of valuable information within Data.gov. This memo is a very interesting executive-level document for study by any government institution interested in an open data policy, and I shall return to it in this article.

Data.gov is made possible by the Electronic Government Fund (EGF) budget item, but it was established purely through executive order, and thus is not guaranteed funding through congressional appropriations. The EGF was considerably reduced by the 2011 Federal budget, which resulted in some cutbacks to Data.gov and departure of key staff, such as Program Executive Sanjeev Bhagowalia. Nevertheless, the project has adapted and almost surprisingly flourished despite these setbacks. A Data.gov "next generation" was launched, which moved most of the site infrastructure to the cloud, to reduce maintenance costs and also to enable more dynamic processing of the data on the site itself, rather than simple, static download.


What Data.gov offers

Data.gov hosts data in several ways. It hosts raw data and geospatial data. The latter is data especially suited for use in mapping applications and mashups. This article focuses on raw data. Data.gov hosts some data sets as links and access modules to external government sites, and it hosts some data sets fully. In the former case (termed "external datasets") Data.gov is hosting just the metadata, and in the latter case (termed "datasets"), it hosts both data and metadata. If you are an agency looking to publish on Data.gov, which approach you take will depend on whether you already have a platform in place to host data yourself and whether you want to gain the advantage of Data.gov's interactive data set features. Either way, you'll gain the advantage of the Data.gov catalog.

Data catalog

At the heart of Data.gov is the catalog, which allows users and applications to browse, explore, search, and filter data sets. Figure 1 is a screen shot from the catalog of raw data. You can see names, descriptions, hit counts and types for each data set. On the left hand side you have options for filtering by data set type or federal agency. You can also search data set metadata.

Figure 1. Screen shot from Data.gov raw data catalog
Screen shot from Data.gov raw data catalog

(View a larger version of Figure 1.)

Interactive data sets

If an agency chooses to have Data.gov host the data as well as the metadata, they gain the benefit of Data.gov's interactive Web display of the data. Such interactive data sets are displayed online in a table, allowing searching, sorting, filtering, and display in charts and graphs. This allows many users to get the information they need without having to download the actual raw data and process it themselves. Figure 2 is a screen shot from the interactive view of one of the data sets, "Tax Year 2007 County Income Data". You can see the first 21 of 3193 rows, with a part of the columns, from "State Abbreviation" to "Wages Income." You can also see the tools at the top to filter, visualize or export the data.

Figure 2. Screen shot from Data.gov interactive data set view
Screen shot from Data.gov interactive data set view

(View a larger version of Figure 2.)

Figure 3 is a screen shot from the same data set as figure 2, but illustrating the filtering features. You can see on the right hand side the filtering criteria.

Figure 3. Screen shot from filtering in Data.gov interactive data set view
Screen shot from filtering in Data.gov interactive data set view

(View a larger version of Figure 3.)

Exporting data

The interactive data set feature also allows a user to export the full data set, or a subset from an applied filter. Figure 4 is a screen shot of the dialog to export the rows that have been selected by the filter shown in figure 3.

Figure 4. Screen shot of export from Data.gov interactive data set view
Screen shot of export from Data.gov interactive data set view

(View a larger version of Figure 4.)

After you click the format it immediately downloads to your browser. Listing 1 is a clipping of the first 2 rows from the resulting XML.

Listing 1. The first 2 rows from the resulting XML
<?xml version="1.0"?>
<response>
  <row>
    <row _id="249" _uuid="A198E315-1C23-4004-A6F2-97321F9AC9ED"
          _position="249" _address="http://explore.data.gov/views/d2bg-b3vp/rows/249">
      <state_code>8</state_code>
      <county_code>0</county_code>
      <state_abbreviation>CO</state_abbreviation>
      <county_name>COLORADO</county_name>
      <total_number_of_tax_returns>2106989</total_number_of_tax_returns>
      <adjusted_gross_income_in_thousands_>128175529</adjusted_gross_income_in_thousands_>
      <wages_and_salaries_incomes_in_thousands_>92308039</wages_and_salaries_
            incomes_in_thousands_>
      <dividend_incomes_in_thousands_>2775567</dividend_incomes_in_thousands_>
      <interest_income_in_thousands_>3872386</interest_income_in_thousands_>
    </row>
    <row _id="250" _uuid="1576D628-29CC-4D44-BCA5-76750198AEF0"
          _position="250" _address="http://explore.data.gov/views/d2bg-b3vp/rows/250">
      <state_code>8</state_code>
      <county_code>1</county_code>
      <state_abbreviation>CO</state_abbreviation>
      <county_name>Adams County</county_name>
      <total_number_of_tax_returns>180985</total_number_of_tax_returns>
      <adjusted_gross_income_in_thousands_>8622959</adjusted_gross_income_in_thousands_>
      <wages_and_salaries_incomes_in_thousands_>7195284</wages_and_salaries_
            incomes_in_thousands_>
      <dividend_incomes_in_thousands_>64424</dividend_incomes_in_thousands_>
      <interest_income_in_thousands_>138173</interest_income_in_thousands_>
    </row>

The data export dialog also gives you options to print and to access the data set as an API through an external program. For example, you can access the 2007 County Tax information set as JavaScript Object Notation (JSON), suitable for many Web applications, by just HTTP GET to URL http://explore.data.gov/api/views/wvps-imhx/rows.json.


What you can learn from Data.gov

If you are a government agency in the U.S. or anywhere else, and you are looking for the best ways to empower citizens and increase the recognition of the value that you provide, you can learn a lot from what Data.gov has accomplished.

First, you might well consider the value of open data. Holding on tightly to the data used in doing the people's work rarely comes with any inherent advantages. Increasingly citizenry the world over are looking for transparency and utility from their governments. It is worth making a clear-eyed assessment of whether it makes sense to adopt a policy that you make data available unless there are well-understood reasons for not doing so.

If you have decided to participate in the move towards open data, you will need to reconsider your overall information technology architecture. It becomes more and more important to establish a clear chain of custody for data, from when it is collected to when it is stored, tracking provenance and other metadata. Where possible, ensure that data export to simple, standard formats is part of any software requisition. Consider a solution such as the IBM Government Industry Framework (see Resources), which enables smarter government across the board, including support for increased transparency.

Other recent computing trends are relevant as well. Open source software very often has strong support of open, standard data formats. Cloud computing offers lower infrastructure and maintenance costs if you are willing to accept some risk of a change to your approach to security. If you do decide on open data you will enjoy an implicit reduction of these risks.

Data.gov is barely two years old but it has already been on a winding road, but with almost 400,000 data sets and well over 1,000 known applications, it has proved a success by any measure, which is even more remarkable when you consider its budgetary difficulties. The fact that Data.gov was able to change course when the going got tough and move to the cloud illustrates the ancillary benefits in flexibility that come about from pursuing the bold strategy of smart, transparent, open government that the U.S. undertook in 2009.

Resources

Learn

Get products and technologies

  • Visit Data.gov for thousands of open data sets from the U.S. Federal Government.
  • Adopt the IBM Government Industry Framework to help you integrate systems and processes across a broad range of public service functions.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Cloud computing
ArticleID=795205
ArticleTitle=Data.gov for government agencies
publish-date=02282012