In English, we use many different words to describe the same basic objects. In one survey, researchers Dieth and Orton explored which words were used for the place where a farmer might keep his cow, depending on where the speaker resided in England. The results include words like byre, shippon, mistall, cow-stable, cow-house, cow-shed, neat-house or beast-house. We see the same situation in visualization, where a two-dimensional chart with data displayed as a collection of points, using one variable for the horizontal axis and one for the vertical, is variously called a scatterplot, a scatter diagram, a scatter graph, a 2D dotplot or even a star field.
There have been a number of attempts to form taxonomies
, or categorizations, of visualizations. Most software packages for creating graphics, such as Microsoft Excel
focus on the type of graphical element used to display the data and then sub-classify from that. This has one immediate problem in that plots with multiple elements are hard to classify (should we classify a chart with a bars and points as a bar chart, with point additions, or instead classify it as a point char, with bars added?). Other authors have started with the dimensionality of the data (one-dimensional, two-dimensional, etc.) and used that as a basic classification criterion, but that has similar problems.
Visualizations are too numerous, too diverse and too exciting to fit well into a taxonomy that divides and subdivides. In contrast to the evolution of animals and plants, which did occur essentially in a tree-like manner, with branches splitting and sub-splitting, information visualization techniques have been invented more by a compositional approach. We take a polar coordinate system, combine it with bars, and achieve a Rose diagram. We put a network in 3D. We add texture, shape and size mappings to all the above. We split it into panels. This is why a traditional taxonomy of information visualization is doomed to be unsatisfying. It is based on a false analogy with biology and denies the basic process by which visualizations have been created: composition.
Within SPSS we have adopted a different approach â€“ looking at charts and visualizations as a language in which we compose â€œparts of speechâ€ into sentences. This approach was pioneered by Leland Wilkinson in his book The Grammar of Graphics. Consider natural language grammars. A sentence is defined by a number of elements which are connected together using simple rules. A well-formed sentence has a certain structure, but within that structure, you are free to use a wide variety of nouns, verbs, adjectives and the like. In the same way, a visualization can be defined by a collection of â€œparts of graphical speechâ€, so a well-formed visualization will have a structure, but within that structure you are free to substitute a variety of different items for each part of speech. In a language, we can make nonsensical sentences that are well-formed. In the same way, under the graphical grammar, we can define visualizations that are well-formed, but also nonsensical. One reason not to ban such seeming nonsense is that you never know how language is going to change to make something meaningful. A chart that a designer might see no use for today becomes valuable in a unique situation, or for some particular data. â€œThe tasty aged phone whistles a pinkâ€ might be meaningless, but â€œthe sweet young thing sings the bluesâ€ is a useful statement, and grammatically similar. In our grammar-based approach, we have a set of different â€œparts of speechâ€ that we compose:
- data â€“ the variables that are to be used.
- coordinates â€“ the basic system into which data will be displayed, together with any transformations of the coordinate systems, lik polarization, reflection, etc.
- elements â€“ the graphic glyphs used to represent data; points, line, areas,â€¦
- statistics â€“ mathematical and statistical functions used to modify the data as it is drawn into the coordinate frame.
- aesthetics â€“ mappings from data to graphical attributes like color, shape, size, â€¦
- faceting â€“ dividing up a graphic into multiple smaller graphics, also known as paneling, trellis, â€¦
- guides â€“ axes, legends and other items that annotate the main graphic
- interactivity â€“ methods for allowing users to interact with the graphics; drilldown, zooming, tooltips, â€¦
- styles â€“ decorations for the graphic that do not affect its basic structure, but modify the final appearance; fonts, default colors, padding and margins, â€¦
The core concept behind our approach is that you should be able to take a chart and modify the language to replace one part by a similar part, and have a well defined and potentially useful result. The result is a system where the limits of what you can display are neither based on how well you can do graphical programming, or how well the computer program you use has implemented a feature, but instead is based simply on combining well-known parts into novel systems.