Khader Shameer1,2, Kipp W. Johnson1,2, Alexandre Yahi7, Riccardo Miotto1,2, Li Li1,2, Doran Ricks3, Jebakumar Jebakaran4, Patricia Kovatch1,4, Partho P. Sengupta5, Annetine Gelijns8, Alan Moskovitz8, Bruce Darrow5, David L. Reich6, Andrew Kasarskis1, Nicholas P. Tatonetti7, Sean Pinney5, Joel T. Dudley1,2,8
1Department of Genetics and Genomics, Icahn Institute of Genomics and Multiscale Biology
2Institute of Next Generation Healthcare, Mount Sinai Health System
3Decision Support, Mount Sinai Health System
4Mount Sinai Data Warehouse, Icahn Institute of Genomics and Multiscale Biology
5Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai
6Department of Anesthesiology, Icahn School of Medicine at Mount Sinai
7Departments of Biomedical Informatics, Systems Biology and Medicine, Columbia University Medical Center
8Population Health Science and Policy, Mount Sinai Health System
Email: joel.dudley@mssm.edu
Pacific Symposium on Biocomputing 22:276-287(2017)
© 2017 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.
Reduction of preventable hospital readmissions that result from chronic or acute conditions like stroke, heart failure, myocardial infarction and pneumonia remains a significant challenge for improving the outcomes and decreasing the cost of healthcare delivery in the United States. Patient readmission rates are relatively high for conditions like heart failure (HF) despite the implementation of high-quality healthcare delivery operation guidelines created by regulatory authorities. Multiple predictive models are currently available to evaluate potential 30-day readmission rates of patients. Most of these models are hypothesis driven and repetitively assess the predictive abilities of the same set of biomarkers as predictive features. In this manuscript, we discuss our attempt to develop a data-driven, electronic- medical record-wide (EMR-wide) feature selection approach and subsequent machine learning to predict readmission probabilities. We have assessed a large repertoire of variables from electronic medical records of heart failure patients in a single center. The cohort included 1,068 patients with 178 patients were readmitted within a 30-day interval (16.66% readmission rate). A total of 4,205 variables were extracted from EMR including diagnosis codes (n=1,763), medications (n=1,028), laboratory measurements (n=846), surgical procedures (n=564) and vital signs (n=4). We designed a multistep modeling strategy using the Naïve Bayes algorithm. In the first step, we created individual models to classify the cases (readmitted) and controls (non-readmitted). In the second step, features contributing to predictive risk from independent models were combined into a composite model using a correlation-based feature selection (CFS) method. All models were trained and tested using a 5-fold cross-validation method, with 70% of the cohort used for training and the remaining 30% for testing. Compared to existing predictive models for HF readmission rates (AUCs in the range of 0.6-0.7), results from our EMR-wide predictive model (AUC=0.78; Accuracy=83.19%) and phenome-wide feature selection strategies are encouraging and reveal the utility of such data- driven machine learning. Fine tuning of the model, replication using multi-center cohorts and prospective clinical trial to evaluate the clinical utility would help the adoption of the model as a clinical decision system for evaluating readmission status.