Motivation
Genome sequencing and large-scale molecular and systemic phenotyping
are already available for large patient studies, and will soon become
routinely available for all patients. Additionally, consumer sensors
enabling the “quantified self” have now made it possible to collect
even richer, individual level, high- dimensional, multi-scale
data--never before as cheap and accessible. This data deluge is setting
the stage for rapid advances in personalized medicine, enabling better
disease classification, more precise treatment, and improved screening
leading to disease prevention. The increasingly rich repertoire of
molecular and cellular data leads not only to the identity and
structure of disease-related pathways but also to the identification of
disease subtypes, their genetic underpinnings, possible drug targets
and repositioning of drugs to these new targets. Key to realizing the
promises of personalized medicine are robust computational approaches
to handle a wide variety of problems including: hidden structure,
missing data, data heterogeneity, massive sample sizes to detect
associations with rare variants, feature selection over many mostly
irrelevant features, noise, and the problem of multiple testing, among
others.
Recent work illustrates the scope and complexity of this exciting new
field. For example, when relating genotype to phenotype via genome-wide
association studies, population structure and family relatedness can
reduce power and cause spurious association. When assaying traits from
clinical samples, cellular heterogeneity has been shown to confound
results of naïve analyses, but also to reveal novel insights about the
disease. Additionally, many phenotypes of interest are not independent
but instead coupled by regulatory, or other factors. Hypothesizing and
inferring the corresponding hidden factors, such as cell type or
transcription factor activity has been shown to shed light on data sets
that previously revealed little insights. There is also great potential
to guide treatment by modeling genotype-dependent environmental effects
on medical phenotypes by leveraging longitudinal multi-omics data, as
well as leveraging and modeling data from very large sets of electronic
medical records. Additionally, there is increasing activity in the
implementation of personalized medicine in clinical settings reporting
new data, perspectives, and challenges that will inform future efforts
to implement personalized medicine at the bedside. In spite of this
recent progress, further advances in statistical modeling and machine
learning combined with informed clinical insights are still needed to
realize the promise of personalized medicine computationally-informed
therapy.
Recent breakthroughs in genome editing technologies hold promise to
revolutionize and accelerate efforts to improve modeling and prediction
of multi-scale biological systems. Technologies such as those based on
CRISPR/Cas9 offer powerful and precise new means to systematically
interrogate and perturb genome biology. Precise genome editing
technologies offer unique methods and data that may be incorporated
into a next generation of computational approaches for personalized
medicine. For example, precise genomic editing, profiling, and
computational modeling of patient-derived iPSC stem cells could form
the basis of novel diagnostics.
This session explores new and open problems pertaining to various
genome-wide and other large scale data, including rare and common SNPs,
structural variants, epigenetic scans, multi-omic data, intermediate
phenotypes, clinical variables from electronic medical records, disease
and quantified-self sensor-based data. We will particularly embrace
submissions that span several of these types of data. The focus will be
on methods that are scalable to real-world problems and help to elicit
results from genome sequence analysis along with and high-dimensional
phenotype data. We will welcome four types of contributions: (1)
descriptions of new problems and ideas on how to tackle them, (2)
development of improved solutions to existing problems, (3) adaptations
that allow existing methods to scale to real-world data sets, and (4)
reports on results from such methods including validated diagnoses
based on novel genetic information. We further explicitly invite
contributions that have direct projected use for therapeutic decisions
and treatment.
Examples of topics and problems within the scope of this session:
Other
topics within the subject area are welcome.