PSB 2003 Tutorial

PSB 2003

The Microarray Workflow: Design, Processing and Management

Shannon K. McWeeney


Due to the potential of microarray technology, the number of investigators utilizing such techniques is growing exponentially. The design of the experiments and the transformation of the data prior to analysis will impact the ability of an investigator to answer the question of interest. Therefore, this tutorial will address statistical issues in the design and processing of the data. Due to the large amount of information generated from micorarray experiments, attention must be given to data management prior to conducting the experiments. The importance of a database management system when dealing with large-scale data sets will be discussed.


Experimental Design, Processing/Normalization and Data Management are the focus of this three-hour tutorial. The breakdown of the tutorial is as follows:

Hour one: Experimental Design
This section will cover statistical considerations for the design of microarray experiments with emphasis on cDNA technology. Topics will include spatial layout, direct vs. indirect comparisons, factorial and loop designs etc. It is assumed that the audience has a rudimentary knowledge of microarray technology. No background will be given during the presentation. However, supplementary material will be available in the handouts and on the web prior to the tutorial. Throughout the presentation, examples will be given to illustrate the statistical concepts in context.

Hour two: Processing/Normalization
Section 2 covers processing and normalization of the data before analysis to remove systematic error and variation in the data. Stages of data processing will be discussed for both cDNA and Affymetrix data. A comparison of normalization methods will be discussed, as well as the underlying assumptions of each method. This will be integrated with Hour 1, in order to examine the impact of design on normalization e.g., impacts of a selected chip etc.

Hour three: Data Management
The final section will focus on data management issues for microarray data. Topics will include a comparison of LIMS vs. data repositories, MIAME compliance, the need for ontologies, integrating a database into your workflow and cross-platform analyses. Issues of data cleaning and integration for data mining will reinforce concepts from the earlier sections.

Biographical Sketch

Dr. Shannon McWeeney is a research fellow at the Center for Bioinformatics at the University of Pennsylvania under the supervision of Dr. Warren Ewens. She recently accepted an assistant professor appointment in the Department of Public Health (Biostatistics Division) at Oregon Health and Sciences University and will have a joint appointment in the Bioinformatics and Biostatistics Core (BBC) of the Gene Microarray Shared Resource facility. Her research is focused on statistical analysis of microarray data with emphasis on time series analysis. At the University of Pennsylvania, she was involved in the extension of RAD's (RNA Abundance Database) schema to capture the stages of processing, normalization and analysis. Dr. McWeeney is a member of the MGED normalization / processing working group and is working with other members to develop controlled vocabularies and protocols to describe data transformations. During her tenure as a graduate student at the University of California at Berkeley, Dr. McWeeney was named an Outstanding Graduate Student Instructor and subsequently received a Teaching Effectiveness Award from Graduate Division. The TEA honors a small number of outstanding Graduate Student Instructors who have made a significant contribution to teaching on campus.