How to Add Clustering to a Graph (GPL)
Clustering involves changes to the COORD
statement
and the ELEMENT
statement. The following steps use
the GPL shown in The Basics (GPL) as a "baseline" for the changes.
- Before modifying the
COORD
andELEMENT
statements, you need to define an additional categorical variable that will be used for clustering. This is specified by aDATA
statement (note theunit.category()
function):DATA: gender=col(source(s), name("gender"), unit.category())
- Now you will modify the
COORD
statement. If, like the baseline graph, the GPL does not already include aCOORD
statement, you first need to add one:COORD: rect(dim(1,2))
In this case, the default coordinate system is now explicit.
- Next add the
cluster
function to the coordinate system and specify the clustering dimension. In a 2-D coordinate system, this is the third dimension:COORD: rect(dim(1,2), cluster(3))
- Now we add the clustering dimension variable to the algebra. This
variable is in the 3rd position, corresponding to the clustering dimension
specified by the
cluster
function in theCOORD
statement:ELEMENT: interval(position(summary.mean(jobcat*salary*gender)))
Note that this algebra looks similar to the algebra for faceting. Without the
cluster
function added in the previous step, the resulting graph would be faceted. Thecluster
function essentially collapses the faceting into one axis. Instead of a facet for each gender category, there is a cluster on the x axis for each category. - Because clustering changes the dimensions, we update the
GUIDE
statement so that it corresponds to the clustering dimension.GUIDE: axis(dim(3), label("Gender"))
- With these changes, the chart is clustered, but there is no way to distinguish the bars in each cluster. You need to add an aesthetic to distinguish the bars:
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)), color(jobcat))
The complete GPL looks like the following.
SOURCE: s = userSource(id("Employeedata"))
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=jobcat gender salary
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: jobcat=col(source(s), name("jobcat"), unit.category())
DATA: gender=col(source(s), name("gender"), unit.category())
DATA: salary=col(source(s), name("salary"))
COORD: rect(dim(1,2), cluster(3))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(2), label("Mean Salary"))
GUIDE: axis(dim(3), label("Gender"))
ELEMENT: interval(position(summary.mean(jobcat*salary*gender)), color(jobcat))
END GPL.
Following is the graph created from the GPL.

Legend Label
The graph includes a legend,
but it has no label by default. To change the label for the legend,
you use a GUIDE
statement:
GUIDE: legend(aesthetic(aesthetic.color), label("Gender"))