Working with the GPL (GGRAPH command)
The Chart Builder allows you to paste GGRAPH
syntax. This syntax contains inline GPL
You may want to edit the GPL to create a chart or add
a feature that isn't available from the Chart Builder. You can use the GPL documentation to help you. However, the GPL documentation
always uses unaggregated data and includes GPL statistics in the examples
to aggregate the data. The pasted syntax, on the other hand, may use
data aggregated by a GGRAPH
summary
function. Also, the pasted syntax includes defaults that you may have
to change when you edit the syntax. Therefore, it may be confusing
how you can use the pasted syntax to create the examples. Following
are some tips.
- Variables must be specified in two places: in the
VARIABLES
keyword in theGGRAPH
command and in theDATA
statements in the GPL. So, if you add a variable, make sure a reference to it appears in both places. - Pasted syntax often uses the
VARIABLES
keyword to specify summary statistics. Like other variables, the summary function name is specified in the GPLDATA
statement. You do not need to useGGRAPH
summary functions. Instead, you can use the equivalent GPL statistic for aggregation. However, for very large data sets, you may find that pre-aggregating the data withGGRAPH
is faster than using the aggregation in the GPL itself. Try both approaches and stick with the one that feels comfortable to you. In the examples that follow, you can compare the different approaches. - Make sure that you understand how the functions are
being used in the GPL. You may need to modify one or more of them
when you add a variable to pasted syntax. For example, if you change
the dimension on which a categorical variable appears, you may need
to change references to the dimension in the
GUIDE
andSCALE
statements. If you are unsure about whether you need a particular function, try removing it and see if you get the results you expect.
Here's an example from the GPL documentation:
DATA: jobcat = col(source(s), name("jobcat"), unit.category())
DATA: gender = col(source(s), name("gender"), unit.category())
DATA: salary = col(source(s), name("salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(3), label("Gender"))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(1), label("Job Category"))
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)))
The simplest way to use the example is to use unaggregated data
and VARIABLES=ALL
like this:
GGRAPH
/GRAPHDATASET NAME="Employeedata" VARIABLES=ALL
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=usersource(id("Employeedata"))
DATA: jobcat = col(source(s), name("jobcat"), unit.category())
DATA: gender = col(source(s), name("gender"), unit.category())
DATA: salary = col(source(s), name("salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(3), label("Gender"))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(1), label("Job Category"))
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)))
END GPL
Note that specifying VARIABLES=ALL
includes all the data in the graph. You can improve performance
by using only those variables that you need. In this example, VARIABLES=jobcat gender salary
would have
been sufficient.
You can also use aggregated data like the following, which is more similar to the pasted syntax:
GGRAPH
/GRAPHDATASET NAME="Employeedata" VARIABLES=jobcat gender MEAN(salary)
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("Employeedata"))
DATA: jobcat=col(source(s), name("jobcat"), unit.category())
DATA: gender=col(source(s), name("gender"), unit.category())
DATA: MEAN_salary=col(source(s), name("MEAN_salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(3), label("Gender"))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(1), label("Job Category"))
ELEMENT: interval(position(jobcat*MEAN_salary*gender))
END GPL.