- Investigator, Howard Hughes Medical Institute
- Distinguished Professor, Biomolecular Engineering, University of California, Santa Cruz
- Director, Center for Biomolecular Science & Engineering, University of California, Santa Cruz
- Director, UCSC Cancer Genomics Hub, University of California, Santa Cruz
- Scientific Co-Director, California Institute for Quantitative Biosciences (QB3)
- Cofounder, Genome 10K Project
Every human disease is a rare disease at the molecular level. No single institute has enough patients to understand any particular molecular subtype. For genomics to benefit medicine and science, we must share data. I outline the data standards and Application Programming Interfaces developed by the Global Alliance for Genomics and Health (GA4GH) that are intended to address this issue, and highlight a few global genomics projects that use them.
Currently the human reference genome GRCh38 captures only a tiny fraction of common human genetic variation in its chosen alternative haplotype regions, and these are seldom used. I describe ideas discussed by the GA4GH for future extensions of the reference genome into a fuller graph-like structure to more adequately capture human genetic variation, so that the reference itself becomes a source of such information. Further, there are many different ways to map individual patient DNA and call genetic variants relative to the reference genome. I describe a new scheme being developed with assistance from the GA4GH in which mapping to the reference genome and calling variants would become a precisely defined and relatively stable process, with a well-defined incremental update when the reference genome expands to a more comprehensive version. This will enable a better standardized and more accurate discourse about human genetic variation for science and medicine.
David Haussler develops new statistical and algorithmic methods to explore the molecular function, evolution, and disease process in the human genome, integrating comparative and high-throughput genomics data to study gene structure, function, and regulation. He is credited with pioneering the use in genomics of hidden Markov models (HMMs), stochastic context-free grammars, and discriminative kernel methods. As a collaborator on the international Human Genome Project, his team posted the first publicly available computational assembly of the human genome sequence on the Internet on July 7, 2000. His team subsequently developed the UCSC Genome Browser, a web-based tool that is used extensively in biomedical research and serves, along with the Ensembl platform, virtually all large-scale vertebrate genomics projects, including NHGRI's ENCODE project, the 1000 Genomes Project, and NCI's TCGA. He built the CGHub database to hold NCI's cancer genome data and is a co-founder and organizing member of the Global Alliance for Genomics and Health (GA4GH), a coalition of the top research, health care, and disease advocacy organizations that have taken the first steps to standardize and enable secure sharing of genomic and clinical data.
Haussler received his Ph.D. in computer science from the University of Colorado at Boulder. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences and a fellow of AAAS and AAAI. He has won a number of awards, including the 2011 Weldon Memorial prize for application of mathematics and statistics to biology, 2009 ASHG Curt Stern Award in Human Genetics, the 2008 Senior Scientist Accomplishment Award from the International Society for Computational Biology, the 2006 Dickson Prize for Science from Carnegie Mellon University, and the 2003 ACM/AAAI Allen Newell Award in Artificial Intelligence.
- Professor and Division Chief
- University of California, San Diego
- Division of Biomedical Informatics
Genome data, when coupled with detailed clinical data (i.e., phenotype descriptions that characterize disease states), can be extremely valuable for the study of health and disease. However, sharing large amounts of detailed clinical and genome data for research is currently difficult because the known problems and prescribed solutions seem to lag behind at least one decade. Combining clinical and molecular data requires special attention to ethical, social, and legal issues that can be partially solved with technology.
I will discuss current solutions to the problem of protecting individual and institutional privacy while promoting research at the intersection of genome and data sciences. While there has been steady progress in making data, software, and systems available to research, a significant shift in the way we think about this issue is needed. Technology and policy need to complement each other to address the requirements of different stakeholders (patients, clinicians, researchers, healthcare administrators, and the public in general). I will discuss current initiatives to promote responsible sharing of human subjects data in a way that minimizes the risk of privacy violations.
Lucila Ohno-Machado, MD, MBA, PhD received her medical degree from the University of Sao Paulo and her doctoral degree in medical information sciences and computer science from Stanford. She is Associate Dean for Informatics and Technology, and the founding chief of the Division of Biomedical Informatics at UCSD, where she leads a group of faculty with diverse backgrounds in medicine, nursing, informatics, and computer science. Prior to her current position, she was faculty at Brigham and Women's Hospital, Harvard Medical School and at the MIT Division of Health Sciences and Technology. Dr. Ohno-Machado is an elected fellow of the American College of Medical Informatics, the American Institute for Medical and Biological Engineering, and the American Society for Clinical Investigation. She serves as editor-in-chief for the Journal of the American Medical Informatics Association since 2011. She directs the NIH-funded iDASH National Center for Biomedical Computing.