ClustanGraphics allows you to run very powerful clustering algorithms on different data types with or without missing values and differential case or variable
weighting. Having read your data, either specify your variable types using an Auto Script, or select Edit/Data Types and specify them interactively using the following dialogue:The example shown here illustrates four types of variables allowed in ClustanGraphics - binary, nominal,
ordinal and continuous, and two data transformations - range or z-scores. These apply as follows:
Binary Two codes other than missing, the higher code signifying "yes" or "present", the lower
code signifying "no" or "absent" (e.g. CreditAllowed, meaning whether the client has credit terms).Nominal Integer codes having no logical numerical order (e.g. AccountType or ClientSector).
Ordinal
Integer codes having a logical numerical order (e.g. VolumeLevel, by band). Continuous
Wide range of numerical values on a continuous or semi-continuous scale (e.g. InvoiceValue, or the actual value of the current contract).
To When you have completed a cluster analysis with mixed data types, the results are easily and flexibly
presented in our cluster model dialogue, shown here. Data Types On first entry, ClustanGraphics examines your data and tries to interpret the type of each variable according to whether the values are integers and their frequencies. This may be correct; for
example, if all your variables are binary then they should be interpreted as binary by having only two possible values. If you have nominal or ordinal variables, they will be interpreted as nominal -
you should therefore change the type of any such variable that is ordinal. To do this, click on the type cell and select from the drop-down list (right).
Variable Transformations
ClustanGraphics allows you to transform ordinal or continuous variables. The transformation options are
none, range or z-scores. Range divides each value by the range of valid values, so that the transformed values range between zero and 1. z-scores transforms the values so that
they have a mean of zero and a standard deviation of 1. To specify the transformation of any variable, click on the variable transform cell and select from the drop-down list
(right). More details of data transformations are here.Transformations are not available for binary or nominal variables. A binary variable is stored as a
present/absent score for each case (e.g. CreditAllowed is either true or false). Liikewise, a nominal variable is stored as a present/absent score for each category represented by an integer code (e.g.
ClientSector=5 is held as true for sector 5 and false for all other sector codes). Variable Weights With ClustanGraphics you can have different weights for each variable. The standard default is a weight of 1, so that all variables have equal weight. If you want to give
some variables more emphasis than others you can specify differential variable weights. To do this, click the variable weight cell and type a new weight value (right).Your current choice of weights can also be reviewed and
changed in the Edit/Weights dialogue, on the Edit menu. Masking Variables If you specify a weight of zero, the variable will be masked from the cluster analysis. In this case, the Edit/Data Types dialogue will show
the variable as masked, and its entries will be grayed (right). This is helpful if you want to carry background variables that are "inactive", that is not to be used for clustering but are nevertheless to be interpreted in
cluster profiling.
Variable Names The Edit/Data Types dialogue allows you to change the names of variables.
Simply click on a variable's name and edit it in situ (right). Your current choice of variable names can also be reviewed and changed in
the Edit/Labels dialogue, on the Edit menu. Variable Summaries
If you point the cursor at any variable and click the right mouse button, a summary of the current parameters for that variable will be displayed. This helps you check that you have selected the correct type and
transformation for the variable (right). You can display a summary table for all your variables, by clicking the
Summary button. An abbreviated table of Data Types specifications can be printed by clicking the Print button. Confirming Data Types When you click OK in the Edit/Data Types dialogue, you will be asked
whether you wish the changed specifications to be confirmed. At this point you can, if you wish, revert to the type settings previously recorded; or you can update to the new settings entered into the
dialogue. Don't forget to save your ClustanGraphics file so that your changes will be correctly reproduced when you next open your file.
You are now ready to run a cluster analysis on mixed data types. The current options are
hierarchical cluster analysis using
Compute Proximities, Nearest Neighbours
, k-Means Analysis and Classify Cases
. For further details, please refer to the file DataTypes.doc which accompanies ClustanGraphics or view a
worked example of Gower's Similarity Coefficient with mixed data types here. Clustan - A Class Act
© 1998 Clustan Ltd.  |