We present the Minimum Message Length (MML) principle in its general form, and show its successes in estimating the parameter of the von Mises circular distribution. This distribution is highly suited to modelling protein dihedral angles.
We present some work by Dowe et al. (PSB96) and some subsequent work by Edgoose et al. (PSB98) applying the MML clustering program, Snob, to cluster protein conformation classes from protein data-bases. Both pieces of work uncover protein classes, but the more recent work better takes into account the serial correlation in the data. The more recent work also finds a class whose presence seems to suggest something about the order in which parts of a protein fold.
We also investigate a simple boolean theory of secondary structure conformation (Extended, Helix or Other) as a function of the Amino Acids surrounding a site.
We also look at work by Powell et al. (1998) on using MML and related information-theoretic methods for finding significant strings in DNA, and compare this with earlier work by Milosavljevic and Jurka (1993) and Milosavljevic (1995).
Other work will also be presented.
I was Program Chair of the Information, Statistics and Induction in Science (ISIS) conference, held in Melbourne, Australia on 20-23 August 1996; attended by R. J. Solomonoff, C. S. Wallace, J. J. Rissanen and others.
Chris Wallace and I are authors of the Snob program for unsupervised clustering and mixture modelling. Snob does Minimum Message Length (MML) mixture modelling of Gaussian, discrete multi-state (Bernoulli or categorical), Poisson and von Mises circular distributions. Further details on Snob are given here. The Snob software is available (subject to conditions) for private, academic use.
Dr. David L Dowe
School of Computer Science and Software Engineering
Monash University, Clayton, Victoria 3168
e-mail : firstname.lastname@example.org
Tel:+61 3 9905-5776 Fax:+61 3 9905-5146