Shape analysis with MATLAB

The data set and functions used on this page are in this file: shape.zip.

All one needs to do is unzip the file into a new directory, open MATLAB, change directory in MATLAB to the new directory, then the following should work.

Notes

  • I have documented some of these functions within the m-files, and this can be accessed in MATLAB with the help command: help OPA, help GPA, etc.
  • A configuration of k landmarks in m-D space is represented by an m-by-k matrix - the alternative convention to the one Ian Dryden used in his R shapes package - and a sample of n configurations is represented by an m-by-k-by-n matrix.
  • The example dataset was kindly provided by Charlie Laughton.

Loading and plotting data

First to load some data:

load moleculeData.mat;

This puts a 3-by-22-by-250 array called moleculeData into the workspace. Data can be visualised with plotShape (this function works for arrays of n ≥ 1 and with m=2 or m=3). For example, to visualise the first configuration:

plotShape(moleculeData(:,:,1));

There is a rotate button (a box with an arrow around it) on the figure toolbar. Selecting this then left-clicking somewhere in the figure and and dragging allows the plot to be rotated. To superimpose the second configuration, this time plotted with circles:

hold on; plotShape(moleculeData(:,:,2),'o');

The shape of the two configurations is similar, but they are orientated and centred differently. Plotting all the configurations together using plotShape(moleculeData) reveals a big mess!

Registering configurations

Registering one configuration to another can be done with Ordinary Procrustes analysis (function OPA). Registering the second to the first:

secondRegToFirst = OPA(moleculeData(:,:,1),moleculeData(:,:,2));

and plotting again

figure; plotShape(moleculeData(:,:,1));
hold on; plotShape(secondRegToFirst,'o');

shows a better comparison. Note that OPA by default performs partial Procrustes analysis (i.e. no scaling); full Procrustes analysis is performed if true is entered as a third argument of OPA.

Registering a whole sample of configurations as closely as possible to each other can be done with generalised Procrustes analysis (function GPA):

moleculeData_GPA = GPA(moleculeData);
figure; plotShape(moleculeData_GPA);

Again, GPA defaults to partial Procrustes analysis; full Procrustes analysis is performed with the call GPA(moleculeData,true).

Mean shapes

One way to define mean shape is as the arithmetic mean of the GPA-registered configurations

moleculeData_GPAmean = mean(GPA(moleculeData),3);

An alternative is the multidimensional-scaling mean shape. When k is large the MDS mean shape is much faster to calculate than the GPA mean and approximates it well for large, non-highly-dispersed samples. The MDS mean is:

moleculeData_MDSmean = MDSmean(moleculeData);

Comparing the two means for our moleculeData (including registration with OPA),

figure; plotShape(moleculeData_GPAmean); hold on;
plotShape(OPA(moleculeData_GPAmean,moleculeData_MDSmean),'o');

shows with these data that the means are nearly identical.

(For determining confidence regions for the mean shape, and for testing equality of mean shape for two samples, I will add here soon a description of how to use our bootstrap code)

Shape variation: principal component analysis

shapePCA shows how shape varies along principal components; e.g.

shapePCA(moleculeData);

shows shape variation at +/- 3 standard deviations long the first 4 PCs. The figure titles show information such as how much of the variation is explained in the respective PC. The axes properties are locked together so that rotating one figure rotates all of them.

The second argument can be used to specify the PCs to be considered, and the third argument to specify the number of SDs for which to show variation; e.g.

shapePCA(moleculeData,[3 5],4);

shows the 3rd and 5th PCs plotted at +/- 4 SDs.

With a fourth argument equal to true one can also plot an animation of the variation along a chosen PC:

shapePCA(moleculeData,1,3,true);

Currently this can be done only one PC at a time.

Other simple stuff - removing location and scale

A configuration (or sample of configurations) can be centred with centre and Helmertised with helm. Helmertised configurations can be de-Helmertised with dehelm. They can be normalised to have unit Frobenius norm with unitFromNorm. Hence

unitFrobNorm(helm(moleculeData));

are the Helmertised pre-shapes and

unitFrobNorm(centre(moleculeData));

are the centred pre-shapes.

Note that dehelm(helm(X)) gives the same result as centre(X).