Genetic network analysis - The millennium opening version

Zoltan Szallasi and Roland Somogyi

Abstract:

Technological developments such as cDNA microarray based RNA quantitation and proteomics have opened the opportunity for massively parallel biological data acquisition. This has shifted our attention towards a more complex understanding of molecular biology. In addition to determining the roles of individual genes, genetic network analysis enables us to study cells as a complex network of biochemical factors. Understanding genetic networks will also provide us with novel drug targets, sensitive diagnostics for individualized therapy, and early indicators of toxic drug effects. The key to a successful application of these technologies lies in properly matching experimental design with data mining and predictive modeling methods.

The aim of this tutorial is to provide an introduction to this fast developing complex research field and provide an overview of its experimental, theoretical and computational foundations. In particular, we will discuss the following issues: 1) Data scope and quality - nature, precision and information content of massively parallel biological data sets. 2) Principles of molecular networks - forward modeling of genetic networks based on Boolean, continuous and stochastic nets. 3) Inference of genetic network architectures - pathways and key molecular control processes. Genetic network inference from gene expression time series and perturbation experiments can be conducted in two stages: a) Pathway inference through integrated cluster analysis, taking into account expression, functional pathway and promoter structure data. We will also provide a comparative evaluation of widely used clustering methods, such as hierarchical clustering, self organizing maps, k-means algorithms etc. b) Reverse engineering of causal molecular relationships from molecular activity data, using methods ranging from Boolean networks to correlation matrices and continuous additive models. By the end of the tutorial attendants can expect to be able to i) make informed decisions about the high-throughput experimental technologies applicable to their system, ii) estimate the amount and nature of information they require to solve their problem, and iii) locate literature, software or collaborators for computational analysis of large data sets. The tutorial is designed to benefit both molecular biologists who seek a good intuitive understanding of how theory and computer science can advance their work AND computer scientists who are looking for a better understanding of molecular biological data and ways to help biologists analyze genetic networks.

We have significantly updated our last year's tutorial in the following three areas: 1) Data preprocessing in order to reduce dimensionality; 2) Comparative evaluation of clustering algorithms; and 3) Introduction of generative models in the analysis of gene expression matrices.

Biographical sketches

Zoltan Szallasi, M.D. is an assistant professor at the Department of Pharmacology, Uniformed Services University. He has been active in both the theoretical and experimental aspects of genetic network analysis of cancer. His lab is conducting cDNA microarray based large-scale gene expression measurements in breast cancer and developing experimental and computational analytical tools in order to achieve a more complex understanding of this disease.

Dr. Roland Somogyi is Chief Scientific Officer for Molecular Mining Corporation. Given his background in experimental and theoretical biophysics, he has pioneered an integrated research strategy toward solving complex biological regulatory networks. The principle of this approach is based on treating the organism as an information processing system, in which the information stored in the genome together with environmental inputs determine the state and dynamics of the molecular network. Dr. Somogyi and his collaborators have developed methods for high-fidelity, high-throughput gene expression data acquisition, exploratory data mining and visualization, and elementary methods for reverse engineering of regulatory network architectures.

Dr. Somogyi has over ten years of experience in academic, government and industry research, most recently as Director of Neurobiology at Incyte Pharmaceuticals. Prior to this he was a principal investigator in the Laboratory of Neurophysiology, NINDS, at the National Institutes of Health in Bethesda, Maryland. A recognized leader in the area of gene networks and pathways based on gene expression data, Dr. Somogyi is frequently called upon as an invited speaker at international meetings in computational biology, genomics and neuroscience, and has organized a number of conference symposia. In addition, his expertise as an advisor on bioinformatics and genomics has been sought by high-level government committees in both North America and Europe. Dr. Somogyi holds an M.Sc. in biology and a Ph.D. in biophysics and physiology (summa cum laude) from the University of Konstanz, Germany.


Back to the PSB tutorial page


Back to the main PSB page