PSB - Abstract

Feature Selection and Dimension Reduction of Social Autism Data

Peter Washington¹, Kelley Marie Paskov², Haik Kalantarian^2,7, Nathaniel Stockham³, Catalin Voss⁵, Aaron Kline^2,7, Ritik Patnaik⁴, Brianna Chrisman¹, Maya Varma⁵, Qandeel Tariq^2,7, Kaitlyn Dunlap^2,7, Jessey Schwartz^2,6, Nick Haber⁶, Dennis P. Wall^2,7

¹Department of Bioengineering, Stanford University
²Department of Biomedical Data Science, Stanford University
³Department of Neuroscience, Stanford University
⁴Department of Computer Science, Massachusetts Institute of Technology
⁵Department of Computer Science, Stanford University
⁶Graduate School of Education, Stanford University
⁷Department of Pediatrics, Stanford University
Email: dpwall@stanford.edu

Pacific Symposium on Biocomputing 25:707-718(2020)

© 2020 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.

Abstract

Autism Spectrum Disorder (ASD) is a complex neuropsychiatric condition with a highly heterogeneous phenotype. Following the work of Duda et al., which uses a reduced feature set from the Social Responsiveness Scale, Second Edition (SRS) to distinguish ASD from ADHD, we performed item-level question selection on answers to the SRS to determine whether ASD can be distinguished from non-ASD using a similarly small subset of questions. To explore feature redundancies between the SRS questions, we performed filter, wrapper, and embedded feature selection analyses. To explore the linearity of the SRS-related ASD phenotype, we then compressed the 65-question SRS into low-dimension representations using PCA, t-SNE, and a denoising autoencoder. We measured the performance of a multi-layer perceptron (MLP) classifier with the top-ranking questions as input. Classification using only the top-rated question resulted in an AUC of over 92% for SRS-derived diagnoses and an AUC of over 83% for dataset-specific diagnoses. High redundancy of features have implications towards replacing the social behaviors that are targeted in behavioral diagnostics and interventions, where digital quantification of certain features may be obfuscated due to privacy concerns. We similarly evaluated the performance of an MLP classifier trained on the low-dimension representations of the SRS, finding that the denoising autoencoder achieved slightly higher performance than the PCA and t-SNE representations.

[Full-Text PDF] [PSB Home Page]