Working with the GPL (GGRAPH command)

The Chart Builder allows you to paste GGRAPH syntax. This syntax contains inline GPL

You may want to edit the GPL to create a chart or add a feature that isn't available from the Chart Builder. You can use the GPL documentation to help you. However, the GPL documentation always uses unaggregated data and includes GPL statistics in the examples to aggregate the data. The pasted syntax, on the other hand, may use data aggregated by a GGRAPH summary function. Also, the pasted syntax includes defaults that you may have to change when you edit the syntax. Therefore, it may be confusing how you can use the pasted syntax to create the examples. Following are some tips.

  • Variables must be specified in two places: in the VARIABLES keyword in the GGRAPH command and in the DATA statements in the GPL. So, if you add a variable, make sure a reference to it appears in both places.
  • Pasted syntax often uses the VARIABLES keyword to specify summary statistics. Like other variables, the summary function name is specified in the GPL DATA statement. You do not need to use GGRAPH summary functions. Instead, you can use the equivalent GPL statistic for aggregation. However, for very large data sets, you may find that pre-aggregating the data with GGRAPH is faster than using the aggregation in the GPL itself. Try both approaches and stick with the one that feels comfortable to you. In the examples that follow, you can compare the different approaches.
  • Make sure that you understand how the functions are being used in the GPL. You may need to modify one or more of them when you add a variable to pasted syntax. For example, if you change the dimension on which a categorical variable appears, you may need to change references to the dimension in the GUIDE and SCALE statements. If you are unsure about whether you need a particular function, try removing it and see if you get the results you expect.

Here's an example from the GPL documentation:

Figure 1. Example from GPL documentation
DATA: jobcat = col(source(s), name("jobcat"), unit.category())
DATA: gender = col(source(s), name("gender"), unit.category())
DATA: salary = col(source(s), name("salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(3), label("Gender"))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(1), label("Job Category"))
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)))

The simplest way to use the example is to use unaggregated data and VARIABLES=ALL like this:

Figure 2. Modified example with unaggregated data
GGRAPH
  /GRAPHDATASET NAME="Employeedata" VARIABLES=ALL
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=usersource(id("Employeedata"))
DATA: jobcat = col(source(s), name("jobcat"), unit.category())
DATA: gender = col(source(s), name("gender"), unit.category())
DATA: salary = col(source(s), name("salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(3), label("Gender"))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(1), label("Job Category"))
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)))
END GPL

Note that specifying VARIABLES=ALL includes all the data in the graph. You can improve performance by using only those variables that you need. In this example, VARIABLES=jobcat gender salary would have been sufficient.

You can also use aggregated data like the following, which is more similar to the pasted syntax:

Figure 3. Modified example with aggregated data
GGRAPH
  /GRAPHDATASET NAME="Employeedata" VARIABLES=jobcat gender MEAN(salary)
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("Employeedata"))
  DATA: jobcat=col(source(s), name("jobcat"), unit.category())
  DATA: gender=col(source(s), name("gender"), unit.category())
  DATA: MEAN_salary=col(source(s), name("MEAN_salary"))
  SCALE: linear(dim(2), include(0))
  GUIDE: axis(dim(3), label("Gender"))
  GUIDE: axis(dim(2), label("Mean Salary"))
  GUIDE: axis(dim(1), label("Job Category"))
  ELEMENT: interval(position(jobcat*MEAN_salary*gender))
END GPL.