A summary tree for the 40-cluster model was then computed by Ward's method, as shown above. The
k-means and hierarchical cluster analyses consistently optimized the Euclidean Sum of Squares (ESS) as the objective function - most other k-means programs are not consistent in this respect.
The nodes of the summary tree are automatically labelled by cluster exemplars, with cluster sizes in parentheses (these labels can be edited). The summary tree has also been optimally ordered, so that
the horizontal cluster order can be more easily interpreted.
As the 6-cluster section of the 40-cluster model was highlighted above, it is then displayed in summary
form using Navigate Tree, as below. Clusters are identified by their level in the tree; thus Cluster +4 is the right-hand cluster at the 4-cluster level. Cluster sizes and the cluster means for the Response Rate
variable are displayed.
Cluster Profiles identifies significant cluster means in all the variables simultaneously. In the example,
the Response Rate variable is highlighted in red. It shows at a glance how the cluster means for all the variables compare at each level from 1 to 6 clusters.
It's easy to see that the 2 cluster level is differentiated on the Response Rate, with means of 2.02 in cluster -2 and 6.89 in cluster +2. The equivalent decision tree rule for the first split, or final fusion, would
be: Response Rate > 4.5.
At the next level the first variable differentiates clusters -3 and +3. At the following cluster level, the first
3 variables are correlated in differentiating clusters -4 (high) and +4 (low), with variable 2 dominating.
Bear in mind that this is not a decision tree. Clusters are formed on all variables simultaneously, so the
analysis is multivariate at each clustering level.
This example illustrated the following ClustanGraphics features: k-means analysis with outlier deletion on
a large survey, summary tree by hierarchical cluster analysis, optimal tree ordering, Navigate Tree, t-tests on variables and cluster profiling.