Probabilistic Modeling of Systematic Errors in TwoHybrid ExperimentsSontag D, Singh R, Bergeryz B


AbstractWe describe a novel probabilistic approach to estimating errors in twohybrid (2H) experiments. Such experiments are frequently used to elucidate proteinprotein interaction networks in a highthroughput fashion; however, a signicant challenge with these is their relatively high error rate, specically, a high falsepositive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be acti vated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins { either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the proba bility of an observed 2H interaction being true as well as the probability of indi vidual proteins being selfactivating/promiscuous. This is the rst approach that explicitly models systematic errors in proteinprotein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we nd that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating condence in 2H predicted interactions, the proposed method performed 510% better overall, and in particular regimes im proved prediction accuracy by as much as 76%. Supplementary Information: http://theory.csail.mit.edu/probmod2H  
[FullText PDF] [PSB Home Page] 