# Procrustes Analysis

## Procrustes Analysis and PROTEST

Procrustes Analysis (least-squares orthogonal mapping) is a method of comparing two sets of data. Simply put, the method is based on matching corresponding points (landmarks) from each of the two data sets. When dealing with morphometric data, these landmarks represent points or physical locations (e.g. centre of the eye) on individuals. Corresponding landmarks would be the same landmark on two different individuals. With community ecology studies, the two sets of landmarks might represent results from an ordination of species at the various sampling sites and an ordination of the environment at the various sites, or the original environmental data. The landmarks are the sampling locations.

## What happens in Procrustes Analysis?

An example of Procrustes Analysis is shown using two simple configurations (Figure A). The corresponding landmarks are shown as upper- and lower-case letters. The objective is to minimize the sum of the squared deviations (termed the error and denoted as the m2 term) between landmarks through translating, rotating and dilating one configuration to match the other configuration (i.e. the target). Figure B shows the configurations following translation, i.e. the data have a common centroid. After rotation (Figure C) the data are adjusted by dilation (scaling) such that the m2 is minimized (Figure D). The deviations between landmarks are called vector residuals. A small vector residual indicates a close agreement between the corresponding landmarks. The m2is based on the sum of the squared deviations (Gower 1971).

## What is PROTEST or PROcrustean randomization TEST?

The typical Procrustes analysis simply provides a descriptive summary and graphical comparison of two configurations of points (i.e. two data sets). Although there is a measure of fit provided (m2), there is no formal means of assessing whether the fit is better than expected by chance. However by employing a randomization or permutation approach to one of the data sets (PROTEST) we can determine whether the original m2 is smaller than expected due to chance (i.e. do they two data sets exhibit greater concordance than expected at random). Under the permutation approach we randomly reorder the observations in one data set while maintaining the covariation structure within the data set. We then compare whether the original m2 is smaller than or equal to the m2 value obtained from the fit of the randomized data set to the second data set. We perform this exercise a large number of times (N) and tabulate the number of times where the observed m2 value was smaller than or equal to that obtained from the randomized data. The associated probability level is calculated as (n+1)/(N+1) where n represents the number of randomised m2 values as small as or smaller than the observed m2 and N is the total number of randomised values calculated. The “1” in the numerator and denominator include the observed value as a member of class of possible values.

## Mantel Test versus PROTEST

While the significance test of the m2 statistic provides an overall measure of the concordance between two data sets, the graphical match of the data sets and the associated residuals provide a much richer source of information than with a Mantel test. Where corresponding landmarks (i.e. observations) match closely there is good agreement between the two data sets. Those points that do not match indicate the points do not match well in comparison to the overall trend. This is similar to points with large residuals in a regression analysis. These points do not follow the general trend of the sample. As well the statistical power associated PROTEST has been shown to be equal to or superior to that from the Mantel test (Peres-Neto and Jackson 2000).   So if there is an underlying relationship between the two sets of data, PROTEST is more capable of detecting it and also provides superior interpretative guidelines given the graphical nature of the results.

Although most biological applications of procrustean methods involve morphometric data sets, the technique can be extended to many additional areas where multiple data sets are compared. For example ecologists often test and examine the relationship between community composition across a series of sites with the environmental conditions at those sites. In this case the landmarks are the various sampling sites where two sets of data are collected, i.e. the community composition and the environmental conditions (e.g. Jackson and Harvey 1993; Olden et al. 2000; Pazkowski and Tonn 2000). Similar cases can be developed for species morphology, genetic composition, spatial arrangement, behavioural characteristics and many other types of data sets. PROTEST provides a means of determining whether two or more data sets exhibit significant association and if so, which sites (or species, individuals, etc.) do or do not fit the generalized pattern.