Feature Selection and Dimension Reduction of Social Autism Data

Peter Washington1, Kelley Marie Paskov2, Haik Kalantarian2,7, Nathaniel Stockham3, Catalin Voss5, Aaron Kline2,7, Ritik Patnaik4, Brianna Chrisman1, Maya Varma5, Qandeel Tariq2,7, Kaitlyn Dunlap2,7, Jessey Schwartz2,6, Nick Haber6, Dennis P. Wall2,7

1Department of Bioengineering, Stanford University
2Department of Biomedical Data Science, Stanford University
3Department of Neuroscience, Stanford University
4Department of Computer Science, Massachusetts Institute of Technology
5Department of Computer Science, Stanford University
6Graduate School of Education, Stanford University
7Department of Pediatrics, Stanford University
Email: dpwall@stanford.edu

Pacific Symposium on Biocomputing 25:707-718(2020)

© 2020 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.


Autism Spectrum Disorder (ASD) is a complex neuropsychiatric condition with a highly heterogeneous phenotype. Following the work of Duda et al., which uses a reduced feature set from the Social Responsiveness Scale, Second Edition (SRS) to distinguish ASD from ADHD, we performed item-level question selection on answers to the SRS to determine whether ASD can be distinguished from non-ASD using a similarly small subset of questions. To explore feature redundancies between the SRS questions, we performed filter, wrapper, and embedded feature selection analyses. To explore the linearity of the SRS-related ASD phenotype, we then compressed the 65-question SRS into low-dimension representations using PCA, t-SNE, and a denoising autoencoder. We measured the performance of a multi-layer perceptron (MLP) classifier with the top-ranking questions as input. Classification using only the top-rated question resulted in an AUC of over 92% for SRS-derived diagnoses and an AUC of over 83% for dataset-specific diagnoses. High redundancy of features have implications towards replacing the social behaviors that are targeted in behavioral diagnostics and interventions, where digital quantification of certain features may be obfuscated due to privacy concerns. We similarly evaluated the performance of an MLP classifier trained on the low-dimension representations of the SRS, finding that the denoising autoencoder achieved slightly higher performance than the PCA and t-SNE representations.

[Full-Text PDF] [PSB Home Page]