Brief Overview of GPL Algebra

Before you can use all of the functions and statements in GPL, it is important to understand its algebra. The algebra determines how data are combined to specify the position of graphic elements in the graph. That is, the algebra defines the graph dimensions or the data frame in which the graph is drawn. For example, the frame of a basic scatterplot is specified by the values of one variable crossed with the values of another variable. Another way of thinking about the algebra is that it identifies the variables you want to analyze in the graph.

The GPL algebra can specify one or more variables. If it includes more than one variable, you must use one of the following operators:

  • Cross (*). The cross operator crosses all of the values of one variable with all of the values of another variable. A result exists for every case (row) in the data. The cross operator is the most commonly used operator. It is used whenever the graph includes more than one axis, with a different variable on each axis. Each variable on each axis is crossed with each variable on the other axes (for example, A*B results in A on the x axis and B on the y axis when the coordinate system is 2-D). Crossing can also be used for paneling (faceting) when there are more crossed variables than there are dimensions in a coordinate system. That is, if the coordinate system were 2-D rectangular and three variables were crossed, the last variable would be used for paneling (for example, with A*B*C, C is used for paneling when the coordinate system is 2-D).
  • Nest (/). The nest operator nests all of the values of one variable in all of the values of another variable. The difference between crossing and nesting is that a result exists only when there is a corresponding value in the variable that nests the other variable. For example, city/state nests the city variable in the state variable. A result will exist for each city and its appropriate state, not for every combination of city and state. Therefore, there will not be a result for Chicago and Montana. Nesting always results in paneling, regardless of the coordinate system.
  • Blend (+). The blend operator combines all of the values of one variable with all of the values of another variable. For example, you may want to combine two salary variables on one axis. Blending is often used for repeated measures, as in salary2004+salary2005.

Crossing and nesting add dimensions to the graph specification. Blending combines the values into one dimension. How the dimensions are interpreted and drawn depends on the coordinate system. See How Coordinates and the GPL Algebra Interact for details about the interaction between the coordinate system and the algebra.

Rules

Like elementary mathematical algebra, GPL algebra has associative, distributive, and commutative rules. All operators are associative:

(X*Y)*Z = X*(Y*Z)
(X/Y)/Z = X/(Y/Z)
(X+Y)+Z = X+(Y+Z)

The cross and nest operators are also distributive:

X*(Y+Z) = X*Y+X*Z
X/(Y+Z) = X/Y+X/Z

However, GPL algebra operators are not commutative. That is,

X*Y ≠ Y*X
X/Y ≠ Y/X

Operator Precedence

The nest operator takes precedence over the other operators, and the cross operator takes precedence over the blend operator. Like mathematical algebra, the precedence can be changed by using parentheses. You will almost always use parentheses with the blend operator because the blend operator has the lowest precedence. For example, to blend variables before crossing or nesting the result with other variables, you would do the following:

(A+B)*C

However, note that there are some cases in which you will cross then blend. For example, consider the following.

(A*C)+(B*D)

In this case, the variables are crossed first because there is no way to untangle the variable values after they are blended. A needs to be crossed with C and B needs to be crossed with D. Therefore, using (A+B)*(C+D) won't work. (A*C)+(B*D) crosses the correct variables and then blends the results together.

Note: In this last example, the parentheses are superfluous, because the cross operator's higher precedence ensures that the crossing occurs before the blending. The parentheses are used for readability.

Analysis Variable

Statistics other than count-based statistics require an analysis variable. The analysis variable is the variable on which a statistic is calculated. In a 1-D graph, this is the first variable in the algebra. In a 2-D graph, this is the second variable. Finally, in a 3-D graph, it is the third variable.

In all of the following, salary is the analysis variable:

  • 1-D. summary.sum(salary)
  • 2-D. summary.mean(jobcat*salary)
  • 3-D. summary.mean(jobcat*gender*salary)

The previous rules apply only to algebra used in the position function. Algebra can be used elsewhere (as in the color and label functions), in which case the only variable in the algebra is the analysis variable. For example, in the following ELEMENT statement for a 2-D graph, the analysis variable is salary in the position function and the label function.

ELEMENT: interval(position(summary.mean(jobcat*salary)), label(summary.mean(salary)))

Unity Variable

The unity variable (indicated by 1) is a placeholder in the algebra. It is not the same as the numeric value 1. When a scale is created for the unity variable, unity is located in the middle of the scale but no other values exist on the scale. The unity variable is needed only when there is no explicit variable in a specific dimension and you need to include the dimension in the algebra.

For example, assume a 2-D rectangular coordinate system. If you are creating a graph showing the count in each jobcat category, summary.count(jobcat) appears in the GPL specification. Counts are shown along the y axis, but there is no explicit variable in that dimension. If you want to panel the graph, you need to specify something in the second dimension before you can include the paneling variable. Thus, if you want to panel the graph by columns using gender, you need to change the specification to summary.count(jobcat*1*gender). If you want to panel by rows instead, there would be another unity variable to indicate the missing third dimension. The specification would change to summary.count(jobcat*1*1*gender).

You can't use the unity variable to compute statistics that require an analysis variable (like summary.mean). However, you can use it with count-based statistics (like summary.count and summary.percent.count).

User Constants

The algebra can also include user constants, which are quoted string values (for example, "2005"). When a user constant is included in the algebra, it is like adding a new variable, with the variable's value equal to the constant for all cases. The effect of this depends on the algebra operators and the function in which the user constant appears.

In the position function, the constants can be used to create separate scales. For example, in the following GPL, two separate scales are created for the paneled graph. By nesting the values of each variable in a different string and blending the results, two different groups of cases with different scale ranges are created.

ELEMENT: line(position(date*(calls/"Calls"+orders/"Orders")))

For a full example, see Line Chart with Separate Scales (GPL) .

If the cross operator is used instead of the nest operator, both categories will have the same scale range. The panel structures will also differ.

ELEMENT: line(position(date*calls*"Calls"+date*orders*"Orders"))

Constants can also be used in the position function to create a category of all cases when the constant is blended with a categorical variable. Remember that the value of the user constant is applied to all cases, so that's why the following works:

ELEMENT: interval(position(summary.mean((jobcat+"All")*salary)))

For a full example, see Simple Bar Chart with Bar for All Categories (GPL) .

Aesthetic functions can also take advantage of user constants. Blending variables creates multiple graphic elements for the same case. To distinguish each group, you can mimic the blending in the aesthetic function—this time with user constants.

ELEMENT: point(position(jobcat*(salbegin+salary), color("Beginning"+"Current")))

User constants are not required to create most charts, so you can ignore them in the beginning. However, as you become more proficient with GPL, you may want to return to them to create custom graphs.