Weaving data visualizations with the Weave platform

An introduction to Weave and open data

Weave is a new platform for the visualization of trend and geographical data, developed at the University of Massachusetts Lowell. Weave supports a wide range of uses and is intended for both novice and advanced users. Explore the use of Weave for visualizing data from publicly available repositories with the hands-on examples in this article.

Share:

M. Tim Jones, Independent author, Consultant

M. Tim JonesM. Tim Jones is an embedded firmware architect and the author of Artificial Intelligence: A Systems Approach, GNU/Linux Application Programming (now in its second edition), AI Application Programming (in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a platform architect with Intel and author in Longmont, Colo.



23 April 2013

Also available in Japanese

Get more from text

Do even more with text by extracting key information and concepts. Advanced text analytics capability in the IBM® big data platform includes a toolkit with accelerators, an IDE, and a declarative language (AQL). With the AQL, developers can parse text, find the elements that are search targets, understand their meaning, and extract them in a structured form for use in other applications. The advanced text analytics capability is available as part of the IBM InfoSphere® Streams and IBM InfoSphere BigInsights™ products. Learn more about InfoSphere Streams and InfoSphere BigInsights.

For data to communicate information effectively, it must be abstracted to a form that's not only functional but also aesthetically pleasing. And, given the scale and complexity of the data that we work with today, it also must be interactive. Among the many data visualization solutions available today, one in particular that is growing in popularity is Weave, or the Web-based Analysis and Visualization Environment.

Weave is more than just a visualization environment. It's an implementation of an idea that goes beyond rendering infographics from data. According to project lead Dr. Georges Grinstein, the goal of Weave is to become the Wikipedia of data and enable anyone to explore existing data for any topic, generate infographics from the data, and make the data more understandable. Further, this process can work for anyone — not just data scientists or engineers, but anyone familiar with typical web-based applications.

This article begins with a quick introduction to the ideas behind open data and then explores the Weave visualization environment in this context, with some hands-on examples.

Data as open source

The open source movement is not restricted to software and hardware. Today, many sites provide public access to data that was never before available. Examples include Data.gov, which is a US federal government website with public access to machine-readable data sets that the government generates. Another example is science.gov, a US government portal for science information and research results. The content on science.gov is provided by participating agencies, including the US Department of Agriculture and US Department of Transportation, the National Science Foundation, and others. Even the United Nations created an open data website that publishes data from its internal agencies and member states.

Good, but not perfect

Data.gov is a great start for a clearinghouse of data, but it does suffer from typical web issues. As I tried to access many data sets, I found they were either unavailable or their hosting sites were continually down for maintenance. The Data.gov site itself is brittle, which might be IT-related or the result of heavy use.

To ensure that Data.gov evolves in a way that is useful, a separate site serves as an open dialogue with its users. Through this site, you can propose a new idea for the site (data or application), browse others' ideas, and vote on them. You can also see the roadmap for how the Data.gov site will change in the future. See Resources for links to more details.

The movement of treating data as a service is growing, and new examples of opening data from governments and scientific institutions are appearing. Other movements in this genre include:

  • Open access, which opens scholarly publications freely on the Internet (including datasets)
  • Open science/open research, focused on opening data, methods, and tools for interdisciplinary research

Although the open data movement pre-dates the Internet, the Internet extends its reach to show data to users worldwide. Making data available, plus the emergence of tools such as Weave, creates greater transparency as it increases the number of eyes on a problem (through its data).


Weave architecture

Weave is developed as a web service through which a client accesses the services through Adobe® Flash® on a browser. The web service for Weave is a set of middleware on an application server (such as Apache Tomcat or Oracle GlassFish Server). Weave accesses its data for visualization from a data server, which today can be implemented with MySQL or PostgreSQL. Weave uses the user-defined data and other specifications from the user to build a visualization, and then presents this visualization as a Flash-based graphic rendered in the browser. Figure 1 illustrates this general architecture:

Figure 1. General architecture of Weave
Diagram showing Weave's general architecture

The Weave back end is supported on several operating systems (including Linux® and Windows®). To use Weave, you need a browser that is enabled with Flash.


Retrieving a data set

Data.gov is a cloud-based data delivery platform that provides an interactive way to view data (the simplest way to explore a portion of the data sets). For application access to data sets, Data.gov uses Representational State Transfer (REST) APIs. For this article, you need access to a raw data set to load into Weave.

The Data.gov home page lists the platform's latest data sets and provides links to applications, developer resources, and — most important — data. To browse the raw data sets, select Raw Data from the Data pull-down menu to access the Raw Data page. The diverse data sets are sorted by relevance. To change the view, define the agency (such as Department of Commerce) with data that you want to view or the categories of data (such as banking, finance, and insurance).

For a first example, I perused the list and chose a small data set called FY09 Education Recipients by State. The data set identifies the numbers of individuals who use US Department of Veterans Affairs education benefits per state in fiscal year (FY) 2009. You can find this data at the URL: https://explore.data.gov/Education/FY-09-Education-Recipients-by-State-/6qaq-tbe6

Typically, you find a download link for the data set, usually through an external link to the source of the data (in this case, the Veterans Benefits Administration). This data set is stored in comma-separated values (CSV) format. To follow along with the first example of using Weave, download the CSV file to your local system.


Using Weave

The Weave installation is one area that the Weave development team admits is not as simple as it might be. Weave requires the installation of an application server, a database engine, Java™ technology, Flash, and the Weave middleware on your host. The team provides detailed instructions on how to install Weave on the Weave site (see Resources for a link to the user guide). If you don't feel up to the task, you can use the instance of Weave running at the team's site. Go to http://oicweave.org and then scroll down to the Demos section. Clicking the image creates an instance of Weave running in your browser through which you can begin to visualize data. That demonstration also incorporates sample data, which is a great way to see the more complex capabilities of Weave in the context of geographical data visualization.

Figure 2 displays what you find when you enter Weave: A set of pull-down menus and a few icons for importing data and replaying activities. As you create and evolve your visualizations, you can easily undo or redo your changes with these options.

Figure 2. Starting Weave (blank window)
Screen capture showing Weave on startup

Next, begin to import your data set. Click Data -> Load my data to display an import window. In this example, load your data set through the Load local file option. Click Next to see your data properly parsed and visible in the Import Data to Weave window, as in Figure 3:

Figure 3. Importing data into Weave
Screen capture showing the Import Data into Weave window

After you change the unique identifier column as appropriate, click Close to finish importing your data. (One Weave simplification is that you can define a data source through a URL, which makes the data automatically retrievable and imports it into Weave. At the time of this writing, the demonstration site raises an error.)

With your data now imported into Weave, your next step is to create a simple plot of the data. Click Tools -> Add Scatterplot to display a scatter plot (although the plot does not yet meet your expectations). Click the vertical axis to change the attribute to TOTAL, and then click Save & Close to configure the plot. Your plot now looks similar to Figure 4:

Figure 4. Simple plot from your data imports
Screen capture showing the simple plot from your data import

To display the state name and the value of the benefits that the state receives, hover your mouse over any particular point within the scatter plot interactively. Later you'll see how this useful feature becomes even more powerful in a multigraph environment.

Next, modify this scatter plot by restricting the plot to a subset of the data. Select the first six states in the plot. You see a dashed-line box temporarily around the data as you mouse over. After you select the data, the plot shows the first six states as normal points, with blurred data for the rest of the states. Right-click within the plot, and then click Create subset from selected record(s). The plot now looks like Figure 5:

Figure 5. Creating a plot from a subset of your data
Screen capture showing the plot created from a subset of your data

The region that is selected can be over the top four states in the plot — or anywhere else you can select a rectangle over the data.

Another simple example comes from worldwide M1+ earthquakes data, which records earthquake information from the past seven days. Get this data from the URL: https://explore.data.gov/Geography-and-Environment/Worldwide-M1-Earthquakes-Past-7-Days/7tag-iwnu

Download the CSV format of the data, which consists of the source of the data, the date-time, the magnitude, and the location (and some other information that you will not incorporate). For this example, request a scatter plot, and use the longitude as the x-axis and the latitude as the y-axis. Also, request that the size of the points represent the magnitude of the earthquake. Create a bar chart to represent the magnitude data. This action illustrates one of the novel ideas in Weave: The cursor selected the largest-magnitude earthquake in the bar chart (a 7.7 magnitude earthquake in the Sea of Okhotsk at the time of this writing). This sample is also automatically highlighted in the scatter plot; it's the largest bubble in the upper right of Figure 6:

Figure 6. Visualizing earthquake data in multiple plots
Screen capture from Weave visualizing earthquake data in multiple plots

This ability to link data in multiple plots interactively is a great way to explore your data. This example is simple, but you can look much more complex examples at the Weave website (including some that address geographical information). Note also in this example the coloring of the individual items. Weave automatically applies colors (which you can change easily) to identify the source of the data. The dark blue data points indicate that the source of the data was from the United States, and the white samples indicate that the data was collected in Alaska.

As in the previous example, if you select and view a subset of the data in a plot, the data in all related plots is modified too, as a function of that subset. This linking capability of data representation in the plots is a unique feature and a great way to focus on a particular subset.


The future of Weave

Data visualizations, IBM style

Check out Many Eyes®, an experiment brought to you by IBM Research and the IBM Cognos® software group. Upload data, visualize it, and talk about your discoveries with others.

Weave remains a platform in beta, but more-advanced features are on the way to its growing population of users. The key challenge for the Weave platform is lowering the bar for visualizing data. Working with Weave can be more complex than working with spreadsheet solutions (particularly when you work with geographic data). Expect Weave developers to pay attention to this key goal of Weave.

One interesting feature now in development is the ability to collaborate on visualizations. People in multiple geographic locations can collaborate in real time on a visualization without third-party collaboration tools. Another innovation is the ability to tie mapping visualizations to other information. The project lead described this capability as a way to combine Google Maps to collections of other information — such as documents — to expand the capabilities of visualization to geographically located resources.

Liberating both data and the ability to visualize that data is an idea that is beginning to grow. Sites like Data.gov and tools like Weave might someday help everyone visualize data of interest and interactively reason about the data in new ways to understand it better. Perhaps some day Weave might be a core technology inside Data.gov and other open data sites, simplifying both access and visualization in one place. Although still in its infancy, this movement is a great start to bringing the power of infographics to everyone and increasing the value of open data.

Resources

Learn

  • Weave: Explore the Weave website, including a detailed user guide.
  • Data.gov: This official US government site is a next-generation platform for making public data universally accessible. Data.gov displays data not only through applications to simplify their viewing but also raw data sets and even access through RESTful APIs. Browse the data sets available through the Data.gov catalog. Filter your search based on your interests, along with types of data (such as raw, geographical, charts, or calendars).
  • Data.gov dialogue site: Interested data users can go here to help evolve Data.gov for the future. Through this site, view the current roadmap for Data.gov and see what other users are proposing. You can also vote on ideas or propose your own to help define where Data.gov goes in the future.
  • Science.gov: This portal provides access to more than 55 databases and 2,100 websites from 13 federal agencies for US government science information. As on Data.gov, you can restrict your searches by search criteria or by specific agencies.
  • SODA (Socrata Open Data API): SODA is a RESTful API for programmatic access to public data. Extract data in several data formats, including JavaScript Object Notation (JSON), XML, and CSV. This site also provides full documentation on the API and its usage, including a developer blog and list of sample applications that are built for open data.
  • Creating a Visualization Page in Weave: View a video that illustrates how to manipulate data within Weave.
  • Open data: Wikipedia provides a great introduction to the topic of open data, including its history and where you can find it.
  • developerWorks Open source technical topic: Find extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM products.
  • Watch developerWorks on-demand demos that range from product installation and setup demos for beginners, to advanced functionality for experienced developers.
  • Follow developerWorks on Twitter.

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=870787
ArticleTitle=Weaving data visualizations with the Weave platform
publish-date=04232013