ÇOKGEN: A Software for the Identification of Rare Copy Number Variation from SNP Microarrays


Gökhan Yavaş1, Mehmet Koyutürk1,3, Meral Özsoyoğlu1, Meetha P. Gould2, Thomas Laframboise2,3



1Department of Electrical Engineering & Computer Science, 2Department of Genetics, 3Center for Proteomics & Bioinformatics, Case Western Reserve University, Cleveland, OH, 44106, USA

Pacific Symposium on Biocomputing 15:371-382(2010)



Abstract

Until fairly recently, it was believed that essentially all human cells harbor two copies of each locus in the autosomal genome. However, studies have now shown that there are segments of the genome that are polymorphic with regard to genomic copy number. These copy number variations (CNVs) have a role in various diseases such as Alzheimer disease, Crohn's disease, autism and schizophrenia. In the effort to scan the entire genome for these gains and losses of DNA, single nucleotide polymorphism (SNP) arrays have emerged as an important tool. As such, CNV identification from SNP array data is attracting considerable attention as an algorithmic problem, and many methods have been published over the last few years. However, many of the existing model-based methods train their models based on common variations and are therefore less successful in the identification of rare CNVs, detection of which may be very important in personalized genomics applications. In this paper, we formulate CNV identification explicitly as an optimization problem with an objective function that is characterized by several adjustable parameters. These parameters can be configured based on the characteristics of the experimental platform and target application, so that the solution to the optimization problem is the most accurate set of CNV calls. Our method, termed ÇOKGEN, efficiently solves this problem using a variant of the well-known heuristic simulated annealing. We apply ÇOKGEN to data from hundreds of samples, and demonstrate its ability to detect known CNVs at a high level of sensitivity without sacrificing specificity, not only for common but also rare CNVs. Furthermore, we show that it performs better than other publicly-available methods. The configurability of ÇOKGEN, its computational efficiency, and its accuracy in calling rare CNVs make it particularly useful for personalized genomics applications. ÇOKGEN is implemented as an R package and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.


[Full-Text PDF] [PSB Home Page]