Peter Washington1, Emilie Leblanc2,3, Kaitlyn Dunlap2,3, Yordan Penev2,3, Maya Varma4, Jae-Yoon Jung2,3, Brianna Chrisman1, Min Woo Sun3, Nathaniel Stockham5, Kelley Marie Paskov3, Haik Kalantarian2,3, Catalin Voss4, Nick Haber6, Dennis P. Wall2,3
1Department of Bioengineering, Stanford University
2Department of Pediatrics (Systems Medicine), Stanford University
3Department of Biomedical Data Science, Stanford University
4Department of Computer Science4, Stanford University
5Department of Neuroscience5, Stanford University
6School of Education, Stanford University
Pacific Symposium on Biocomputing 26:14-25(2021)
© 2021 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.
Crowd-powered telemedicine has the potential to revolutionize healthcare, especially during times that require remote access to care. However, sharing private health data with strangers from around the world is not compatible with data privacy standards, requiring a stringent filtration process to recruit reliable and trustworthy workers who can go through the proper training and security steps. The key challenge, then, is to identify capable, trustworthy, and reliable workers through high-fidelity evaluation tasks without exposing any sensitive patient data during the evaluation process. We contribute a set of experimentally validated metrics for assessing the trustworthiness and reliability of crowd workers tasked with providing behavioral feature tags to unstructured videos of children with autism and matched neurotypical controls. The workers are blinded to diagnosis and blinded to the goal of using the features to diagnose autism. These behavioral labels are fed as input to a previously validated binary logistic regression classifier for detecting autism cases using categorical feature vectors. While the metrics do not incorporate any ground truth labels of child diagnosis, linear regression using the 3 correlative metrics as input can predict the mean probability of the correct class of each worker with a mean average error of 7.51% for performance on the same set of videos and 10.93% for performance on a distinct balanced video set with different children. These results indicate that crowd workers can be recruited for performance based largely on behavioral metrics on a crowdsourced task, enabling an affordable way to filter crowd workforces into a trustworthy and reliable diagnostic workforce.