John Darrell Van Horn1, Lily Fierro2, Jeana Kamdar1, Jonathan Gordon2, Crystal Stewart1, Avnish Bhattrai1, Sumiko Abe1, Xiaoxiao Lei1, Caroline O'Driscoll1, Aakanchha Sinha2, Priyambada Jain2, Gully Burns2, Kristina Lerman2, José Luis Ambite2
1USC Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California
2Information Sciences Institute, University of Southern California
Email: jvanhorn@usc.edu, jeana.kamdar@ini.usc.edu, crystal.stewart@ini.usc.edu, avnish.bhattrai@ini.usc.edu, Sumiko.abe@ini.usc.edu, xiaolei@ini.usc.edu, caroline.odriscoll@ini.usc.edu, ambite@isi.edu, lfierro@isi.edu, jgordon@isi.edu, burns@isi.edu, lerman@isi.edu, priyambj@isi.edu
Pacific Symposium on Biocomputing 23:292-303(2018)
© 2018 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.
The biomedical sciences have experienced an explosion of data which promises to overwhelm many current practitioners. Without easy access to data science training resources, biomedical researchers may find themselves unable to wrangle their own datasets. In 2014, to address the challenges posed such a data onslaught, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative. To this end, the BD2K Training Coordinating Center (TCC; bigdatau.org) was funded to facilitate both in-person and online learning, and open up the concepts of data science to the widest possible audience. Here, we describe the activities of the BD2K TCC and its focus on the construction of the Educational Resource Discovery Index (ERuDIte), which identifies, collects, describes, and organizes online data science materials from BD2K awardees, open online courses, and videos from scientific lectures and tutorials. ERuDIte now indexes over 9,500 resources. Given the richness of online training materials and the constant evolution of biomedical data science, computational methods applying information retrieval, natural language processing, and machine learning techniques are required - in effect, using data science to inform training in data science. In so doing, the TCC seeks to democratize novel insights and discoveries brought forth via large-scale data science training.