For data to communicate information effectively, it must be abstracted to a form that's not only functional but also aesthetically pleasing. And, given the scale and complexity of the data that we work with today, it also must be interactive. Among the many data visualization solutions available today, one in particular that is growing in popularity is Weave, or the Web-based Analysis and Visualization Environment.
Weave is more than just a visualization environment. It's an implementation of an idea that goes beyond rendering infographics from data. According to project lead Dr. Georges Grinstein, the goal of Weave is to become the Wikipedia of data and enable anyone to explore existing data for any topic, generate infographics from the data, and make the data more understandable. Further, this process can work for anyone — not just data scientists or engineers, but anyone familiar with typical web-based applications.
This article begins with a quick introduction to the ideas behind open data and then explores the Weave visualization environment in this context, with some hands-on examples.
Data as open source
The open source movement is not restricted to software and hardware. Today, many sites provide public access to data that was never before available. Examples include Data.gov, which is a US federal government website with public access to machine-readable data sets that the government generates. Another example is science.gov, a US government portal for science information and research results. The content on science.gov is provided by participating agencies, including the US Department of Agriculture and US Department of Transportation, the National Science Foundation, and others. Even the United Nations created an open data website that publishes data from its internal agencies and member states.
To ensure that Data.gov evolves in a way that is useful, a separate site serves as an open dialogue with its users. Through this site, you can propose a new idea for the site (data or application), browse others' ideas, and vote on them. You can also see the roadmap for how the Data.gov site will change in the future. See Resources for links to more details.
The movement of treating data as a service is growing, and new examples of opening data from governments and scientific institutions are appearing. Other movements in this genre include:
- Open access, which opens scholarly publications freely on the Internet (including datasets)
- Open science/open research, focused on opening data, methods, and tools for interdisciplinary research
Although the open data movement pre-dates the Internet, the Internet extends its reach to show data to users worldwide. Making data available, plus the emergence of tools such as Weave, creates greater transparency as it increases the number of eyes on a problem (through its data).
Weave is developed as a web service through which a client accesses the services through Adobe® Flash® on a browser. The web service for Weave is a set of middleware on an application server (such as Apache Tomcat or Oracle GlassFish Server). Weave accesses its data for visualization from a data server, which today can be implemented with MySQL or PostgreSQL. Weave uses the user-defined data and other specifications from the user to build a visualization, and then presents this visualization as a Flash-based graphic rendered in the browser. Figure 1 illustrates this general architecture:
Figure 1. General architecture of Weave
The Weave back end is supported on several operating systems (including Linux® and Windows®). To use Weave, you need a browser that is enabled with Flash.
Retrieving a data set
Data.gov is a cloud-based data delivery platform that provides an interactive way to view data (the simplest way to explore a portion of the data sets). For application access to data sets, Data.gov uses Representational State Transfer (REST) APIs. For this article, you need access to a raw data set to load into Weave.
The Data.gov home page lists the platform's latest data sets and provides links to applications, developer resources, and — most important — data. To browse the raw data sets, select Raw Data from the Data pull-down menu to access the Raw Data page. The diverse data sets are sorted by relevance. To change the view, define the agency (such as Department of Commerce) with data that you want to view or the categories of data (such as banking, finance, and insurance).
For a first example, I perused the list and chose a small data set called FY09 Education Recipients by State. The data set identifies the numbers of individuals who use US Department of Veterans Affairs education benefits per state in fiscal year (FY) 2009. You can find this data at the URL: https://explore.data.gov/Education/FY-09-Education-Recipients-by-State-/6qaq-tbe6
Typically, you find a download link for the data set, usually through an external link to the source of the data (in this case, the Veterans Benefits Administration). This data set is stored in comma-separated values (CSV) format. To follow along with the first example of using Weave, download the CSV file to your local system.
The Weave installation is one area that the Weave development team admits is not as simple as it might be. Weave requires the installation of an application server, a database engine, Java™ technology, Flash, and the Weave middleware on your host. The team provides detailed instructions on how to install Weave on the Weave site (see Resources for a link to the user guide). If you don't feel up to the task, you can use the instance of Weave running at the team's site. Go to http://oicweave.org and then scroll down to the Demos section. Clicking the image creates an instance of Weave running in your browser through which you can begin to visualize data. That demonstration also incorporates sample data, which is a great way to see the more complex capabilities of Weave in the context of geographical data visualization.
Figure 2 displays what you find when you enter Weave: A set of pull-down menus and a few icons for importing data and replaying activities. As you create and evolve your visualizations, you can easily undo or redo your changes with these options.
Figure 2. Starting Weave (blank window)
Next, begin to import your data set. Click Data -> Load my data to display an import window. In this example, load your data set through the Load local file option. Click Next to see your data properly parsed and visible in the Import Data to Weave window, as in Figure 3:
Figure 3. Importing data into Weave
After you change the unique identifier column as appropriate, click Close to finish importing your data. (One Weave simplification is that you can define a data source through a URL, which makes the data automatically retrievable and imports it into Weave. At the time of this writing, the demonstration site raises an error.)
With your data now imported into Weave, your next step is to create a simple plot of the data. Click Tools -> Add Scatterplot to display a scatter plot (although the plot does not yet meet your expectations). Click the vertical axis to change the attribute to TOTAL, and then click Save & Close to configure the plot. Your plot now looks similar to Figure 4:
Figure 4. Simple plot from your data imports
To display the state name and the value of the benefits that the state receives, hover your mouse over any particular point within the scatter plot interactively. Later you'll see how this useful feature becomes even more powerful in a multigraph environment.
Next, modify this scatter plot by restricting the plot to a subset of the data. Select the first six states in the plot. You see a dashed-line box temporarily around the data as you mouse over. After you select the data, the plot shows the first six states as normal points, with blurred data for the rest of the states. Right-click within the plot, and then click Create subset from selected record(s). The plot now looks like Figure 5:
Figure 5. Creating a plot from a subset of your data
The region that is selected can be over the top four states in the plot — or anywhere else you can select a rectangle over the data.
Another simple example comes from worldwide M1+ earthquakes data, which records earthquake information from the past seven days. Get this data from the URL: https://explore.data.gov/Geography-and-Environment/Worldwide-M1-Earthquakes-Past-7-Days/7tag-iwnu
Download the CSV format of the data, which consists of the source of the data, the date-time, the magnitude, and the location (and some other information that you will not incorporate). For this example, request a scatter plot, and use the longitude as the x-axis and the latitude as the y-axis. Also, request that the size of the points represent the magnitude of the earthquake. Create a bar chart to represent the magnitude data. This action illustrates one of the novel ideas in Weave: The cursor selected the largest-magnitude earthquake in the bar chart (a 7.7 magnitude earthquake in the Sea of Okhotsk at the time of this writing). This sample is also automatically highlighted in the scatter plot; it's the largest bubble in the upper right of Figure 6:
Figure 6. Visualizing earthquake data in multiple plots
This ability to link data in multiple plots interactively is a great way to explore your data. This example is simple, but you can look much more complex examples at the Weave website (including some that address geographical information). Note also in this example the coloring of the individual items. Weave automatically applies colors (which you can change easily) to identify the source of the data. The dark blue data points indicate that the source of the data was from the United States, and the white samples indicate that the data was collected in Alaska.
As in the previous example, if you select and view a subset of the data in a plot, the data in all related plots is modified too, as a function of that subset. This linking capability of data representation in the plots is a unique feature and a great way to focus on a particular subset.
The future of Weave
Weave remains a platform in beta, but more-advanced features are on the way to its growing population of users. The key challenge for the Weave platform is lowering the bar for visualizing data. Working with Weave can be more complex than working with spreadsheet solutions (particularly when you work with geographic data). Expect Weave developers to pay attention to this key goal of Weave.
One interesting feature now in development is the ability to collaborate on visualizations. People in multiple geographic locations can collaborate in real time on a visualization without third-party collaboration tools. Another innovation is the ability to tie mapping visualizations to other information. The project lead described this capability as a way to combine Google Maps to collections of other information — such as documents — to expand the capabilities of visualization to geographically located resources.
Liberating both data and the ability to visualize that data is an idea that is beginning to grow. Sites like Data.gov and tools like Weave might someday help everyone visualize data of interest and interactively reason about the data in new ways to understand it better. Perhaps some day Weave might be a core technology inside Data.gov and other open data sites, simplifying both access and visualization in one place. Although still in its infancy, this movement is a great start to bringing the power of infographics to everyone and increasing the value of open data.
- Weave: Explore the Weave website, including a detailed user guide.
- Data.gov: This official US government site is a next-generation platform for making public data universally accessible. Data.gov displays data not only through applications to simplify their viewing but also raw data sets and even access through RESTful APIs. Browse the data sets available through the Data.gov catalog. Filter your search based on your interests, along with types of data (such as raw, geographical, charts, or calendars).
- Data.gov dialogue site: Interested data users can go here to help evolve Data.gov for the future. Through this site, view the current roadmap for Data.gov and see what other users are proposing. You can also vote on ideas or propose your own to help define where Data.gov goes in the future.
- Science.gov: This portal provides access to more than 55 databases and 2,100 websites from 13 federal agencies for US government science information. As on Data.gov, you can restrict your searches by search criteria or by specific agencies.
- Creating a Visualization Page in Weave: View a video that illustrates how to manipulate data within Weave.
- Open data: Wikipedia provides a great introduction to the topic of open data, including its history and where you can find it.
- developerWorks Open source technical topic: Find extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM products.
- Watch developerWorks on-demand demos that range from product installation and setup demos for beginners, to advanced functionality for experienced developers.
- Follow developerWorks on Twitter.
Get products and technologies
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Check out developerWorks blogs and get involved in the developerWorks community. Connect with other developerWorks users as you explore the developer-driven blogs, forums, groups, and wikis.