Haplotype Phasing By Multi-Assembly of Shared Haplotypes: Phase-Dependent Interactions Between Rare Variants


Bjarni V. Halldorsson1, Derek Aguiar2,3, Sorin Istrail2,3



1School of Science and Engineering, Reykjavik University, Reykjavik, Iceland; 2Department of Computer Science, Brown University, Providence, RI, USA; 3Center for Computational Molecular Biology, Brown University, Providence, RI, USA

Pacific Symposium On Biocomputing 16:88-99(2011)


Abstract

In this paper we propose algorithmic strategies, Lander-Waterman-like statistical estimates, and genome-wide software for haplotype phasing by multi-assembly of shared haplotypes. Speci cally, we consider four types of results which together provide a comprehensive work ow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype assembly based on con ict graphs of sequencing reads (3) inference of pedigree structure through haplotype sharing via tract nding algorithms and (4) multi-assembly of shared haplotypes of cases, controls, and trios. The input for the work ows that we consider are any of the combination of: (A) genotype data (B) next generation sequencing (NGS) (C) pedigree information. (1) We present Lander-Waterman-like statistics for NGS projects for the multi-assembly of shared haplotypes. Results are presented in Sec. 2. (2) In Sec. 3, we present algorithmic strategies for haplotype assembly using NGS, NGS + genotype data, and NGS + pedigree information. (3) This work builds on algorithms presented in Halldorsson et al.1 and are part of the same library of tools co-developed for GWAS work ows. (4) Section 3.3.1 contains algorithmic strategies for multi- assembly of GWAS data. We present algorithms for assembling large data sets and for determining and using shared haplotypes to more reliably assemble and phase the data. Workflows 1-4 provide a set of rigorous algorithms which have the potential to identify phase-dependent interactions between rare variants in linkage equilibrium which are associated with cases. They build on our extensive work on haplotype phasing,1{3 haplotype assembly,4,5 and whole genome assembly comparison.6


[Full-Text PDF] [PSB Home Page]