Poster # Presenting Poster Author Name Last Affiliation / Institution Session/Workshop Area Abstract Type Abstract Title List all authors (first name first with names separated by commas and proper capitalization) in the order they appear on the abstract.  Abstract (300 words or less) Poster DOI or URL
1 Zixuan Wen University of Pennsylvania Big Data Imaging Genomics Accepted proceedings paper with poster presentation Identifying imaging genetic associations via regional morphometricity estimation Jingxuan Bao,

Zixuan Wen,

Mansu Kim,

Andrew J. Saykin,

Paul M. Thompson,

Yize Zhao,

Li Shen,

for the Alzheimer's Disease Neuroimaging Initiative
Brain imaging genetics is an emerging research field aiming to reveal the genetic basis of brain traits captured by imaging data. Inspired by heritability analysis, the concept of morphometricity was recently introduced to assess trait association with whole brain morphology. In this study, we extend the concept of morphometricity from its original definition at the whole brain level to a more focal level based on a region of interest (ROI). We propose a novel framework to identify the SNP-ROI association via regional morphometricity estimation of each studied single nucleotide polymorphism (SNP). We perform an empirical study on the structural MRI and genotyping data from a landmark Alzheimer's disease (AD) biobank; and yield promising results. Our findings indicate that the AD-related SNPs have higher overall regional morphometricity estimates than the SNPs not yet related to AD. This observation suggests that the variance of AD SNPs can be explained more by regional morphometric features than non-AD SNPs, supporting the value of imaging traits as targets in studying AD genetics. Also, we identified 11 ROIs, where the AD/non-AD SNPs and significant/insignificant morphometricity estimation of the corresponding SNPs in these ROIs show strong dependency. Supplementary motor area (SMA) and dorsolateral prefrontal cortex (DPC) are enriched by these ROIs. Our results also demonstrate that using all the detailed voxel-level measures within the ROI to incorporate morphometric information outperforms using only a single average ROI measure, and thus provides improved power to detect imaging genetic associations. https://f1000research.com/posters/10-999
2 Akito Yamamoto the University of Tokyo Big Data Imaging Genomics Accepted proceedings paper with poster presentation Efficient Differentially Private Methods for a Transmission Disequilibrium Test in Genome Wide Association Studies Akito Yamamoto, Tetsuo Shibuya To achieve the provision of personalized medicine, it is very important to investigate the relationship between diseases and human genomes. For this purpose, large-scale genetic studies such as genome-wide association studies are often conducted, but there is a risk of identifying individuals if the statistics are released as they are. In this study, we propose new efficient differentially private methods for a transmission disequilibrium test, which is a family-based association test. Existing methods are computationally intensive and take a long time even for a small cohort. Moreover, for approximation methods, sensitivity of the obtained values is not guaranteed. We present an exact algorithm with a time complexity of O(nm) for a dataset containing n families and m single nucleotide polymorphisms (SNPs). We also propose an approximation algorithm that is faster than the exact one and prove that the obtained scores' sensitivity is 1. From our experimental results, we demonstrate that our exact algorithm is 10,000 times faster than existing methods for a small cohort with 5,000 SNPs. The results also indicate that the proposed method is the first in the world that can be applied to a large cohort, such as those with 1,000,000 SNPs. In addition, we examine a suitable dataset to apply our approximation algorithm. Supplementary materials are available at https://github.com/ay0408/DP-trio-TDT. https://f1000research.com/posters/10-968
3 Joshua Levy Dartmouth Hitchcock Medical Center Human Intrigue: Meta Analysis Approaches for Big Questions with Big Data Accepted proceedings paper with poster presentation Mixed Effects Machine Learning Models for Colon Cancer Metastasis Prediction using Spatially Localized Immuno-Oncology Markers Joshua J. Levy, Carly A. Bobak, Mustafa Nasir-Moin, Eren M. Veziroglu, Scott M. Palisoul, Rachael E. Barney, Lucas A. Salas, Brock C. Christensen, Gregory J. Tsongalis, Louis J. Vaickus Spatially resolved characterization of the transcriptome and proteome promises to provide further clarity on cancer pathogenesis and etiology, which may inform future clinical practice through classifier development for clinical outcomes. However, batch effects may potentially obscure the ability of machine learning methods to derive complex associations within spatial omics data. Profiling thirty-five stage three colon cancer patients using the GeoMX Digital Spatial Profiler, we found that mixed-effects machine learning (MEML) methods may provide utility for overcoming significant batch effects to communicate key and complex disease associations from spatial information. These results point to further exploration and application of MEML methods within the spatial omics algorithm development life cycle for clinical deployment.  https://f1000research.com/posters/10-980
4 Felix Quintana Lehigh University AI-driven Advances in Modeling of Protein Structure Accepted proceedings paper with oral presentation DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity Felix M. Quintana, Zhaoming Kong, Lifang He, and Brian Y. Chen Amino acids that are involved in binding specificity can be identified with many methods, but few techniques identify the biochemical mechanisms by which they act. We hypothesize that an Analytic Ensemble of techniques, each focused on a single kind of biochemical mechanism that influences binding, could simplify the larger problem into more manageable pieces.  Here, we present one technique that can suggest electrostatic mechanisms that influence specificity. 



We produced a classifier called DeepVASP-E that applies 3D convolutional neural networks to categorize a voxel-based electrostatic representation of ligand binding sites into categories with different ligand binding preferences. It relies exclusively on voxelized electrostatic data, ensuring that any classification it produces is explained at least in part by electrostatic mechanisms. 



We hypothesized that voxels that are salient for classification by DeepVASP-E would also be regions of electrostatic isopotential that are crucial for achieving specific binding.  We applied Grad CAM++ for measure classification salience, and then verified the resulting regions against biochemical findings on the proteins in our dataset.  Our findings, on two families of proteins with electrostatic influences on specificity, suggest that large salient regions occur nearby identify amino acids that have a substantial electrostatic role in binding.  By verifying the explanations generated by our technique against experimentally established explanations in the peer-reviewed literature, we find that our approach can be an effective technique for explaining electrostatic mechanisms that control protein specificity.

https://www.lehigh.edu/~fmq221/Quintana_PSB2022_Poster.pdf
5 Anna Antoniak University of Gdansk AI-driven Advances in Modeling of Protein Structure Poster only Modeling proteins structures with the coarse–grained UNRES force field in the CASP14 experiment Anna Antoniak, Patryk A. Wesołowski, Adam K. Sieradzan, Cezary Czaplewski, Emilia A. Lubecka, Agnieszka G. Lipska, Artur Giełdoń Rafał Ślusarz, Paweł Krupa, Mateusz Kogut, Iga Biskupek, Krzysztof K. Bojarski, Małgorzata Kogut, Mateusz Marcisz, Martyna Maszota-Zieleniak, Sergey A. Samsonov, Magdalena J. Ślusarz, Karolina Zięba, Adam Liwo Modeling proteins structures with the coarse–grained UNRES force field in the CASP14 experiment

Antoniak Anna1, Wesołowski Patryk A.1, Sieradzan Adam K.1 , Czaplewski Cezary1 , Lubecka Emilia A.2, Lipska Agnieszka G.1 , Giełdoń Artur1 , Ślusarz Rafał1 , Krupa Paweł3 , Kogut Mateusz1 , Biskupek Iga1 , Bojarski Krzysztof K.1 , Kogut Małgorzata M.1 , Marcisz Mateusz1 , Maszota-Zieleniak Martyna1 , Samsonov Sergey A.1 , Ślusarz Magdalena J.1 , Zięba Karolina1 , Liwo Adam 1

[1] Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland,

[2] Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, G. Narutowicza 11/12, 80-233 Gdańsk, Poland,

[3] Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw, PL-02668, Poland}



The UNRES force field was tested in the CASP14 experiment of protein-structure prediction, in which larger monomeric and multimeric targets were present compared to previous editions. Three prediction modes were tested (I) ab initio (the UNRES group), (II) contact-assisted (the UNRES-contact group), and (III) template-assisted (the UNRES-template group). The average Global Distance Test Total Score (GDT_TS) of the `Model 1' predictions were 29.17, 39.32, and 56.37 for the UNRES, UNRES-contact, and UNRES-template predictions, respectively, increasing by 0.53, 2.24, and 3.76, respectively, compared to CASP13. It was also found that the GDT_TS of the UNRES models obtained in ab initio mode and in the contact-assisted mode decreases with the square root of chain length, while the exponent in this relationship is 0.20 for the UNRES-template group models and 0.11 for the best performing AlphaFold2 models, which suggests that incorporation of database information, which stems from protein evolution, brings in long-range correlations, thus enabling the correction of force-field inaccuracies.



Acknowledgments

This work was supported by grants UMO-2017/25/B/ST4/01026, UMO-2017/27B/ST4/00926, UMO-2017/26/M/ST4/00044, UMO-2018/30/EST4/00037, and UMO-2018/31/N/ST4/01677 from the National Science Center of Poland (Narodowe Centrum Nauki).
www.unres.pl/sites/unres/files/poster2.pdf
6 Kristy Carpenter Stanford University AI-driven Advances in Modeling of Protein Structure Poster only SiteFEATURE: Using the Protein Microenvironment for Cryptic Pocket Prediction Kristy A. Carpenter, Yu-Chen Lo, Russ B. Altman Identification of ligand binding pockets is an essential early step for structure-based drug discovery. Most pocket-finding algorithms use geometric approaches to identify cavities on the protein surface that could function as binding pockets. However, it is difficult for geometry-based algorithms to identify cryptic pockets, which only assume a recognizable cavity once the ligand is bound and do not appear cavity-like in the apo form. We introduce SiteFEATURE, a non-geometric pocket finding algorithm that uses features of the protein microenvironment to identify promising binding surfaces. SiteFEATURE compares the microenvironments of surface residues to those of a reference library of known binding pockets to identify binding hotspots. The binding hotspots are clustered into pockets and ranked according to their predicted ligandability. A user can then manually examine the top k predicted pockets to determine with which to proceed. We apply SiteFEATURE to a dataset of 79 proteins with cryptic pockets and characterize its ability to predict binding sites from each of the holo and apo forms. https://kristycarp.github.io/psb22poster/2022.01.05.psb_sitefeature.pdf
7 Alexander Derry Stanford University AI-driven Advances in Modeling of Protein Structure Poster only COLLAPSE: Continuous Latents Learned from Aligned Protein Structural Environments Alexander Derry, Russ B. Altman Deep neural networks that learn complex features from biomolecular data hold the potential to advance our understanding of disease mechanisms and solve crucial problems in drug discovery and medicine. However, datasets of experimental measurements are often too small for supervised models to fully leverage the richness of the input data, particularly for complex molecules such as proteins. One solution to this problem is to use self-supervised learning on large-scale protein databases to learn generalizable representations which can be transferred to diverse downstream tasks. This approach has recently been applied to great effect on protein sequence data. While these methods show promise for many tasks, a protein’s function arises not from sequence directly but from the intricacies of its folded conformation in 3D space, particularly around specific structural sites in the protein. These sites often consist of amino acids that are far apart in sequence space, so a representation learned directly from structure may be more effective for tasks in which subtle changes in atomic conformation are critical. We present a novel self-supervised learning framework called COLLAPSE, which combines large-scale protein structure databases with the evolutionary information encoded by multiple sequence alignments (MSAs) to learn embedded representations of protein structural environments. Specifically, representations are optimized to be similar for sites which are aligned in the MSA, thus implicitly capturing common structural and functional roles across protein families. We show that the learned representations capture relevant biochemical and functional features of the protein site by training simple downstream models on tasks ranging in difficulty from secondary structure prediction to the analysis of mutation effects on protein function and stability. In doing so, we demonstrate the potential of representations learned directly from three-dimensional structures.  
8 Jelena Vucinic IRIT AI-driven Advances in Modeling of Protein Structure Poster only Learning sequence-structure relationships with Recurrent Relational Networks Jelena Vucinic, David Simoncini The conception of new artificial proteins with novel or improved properties is the purpose of Computational Protein Design (CPD). Despite many exciting achievements in the last decade, CPD remains an incredibly difficult task that counts a high number of failures for each success. Providing high quality in silico assessment measures would improve the success rate of CPD methods by sparing a significant amount of costly and time consuming experimental validations. Recent breakthrough in the field of protein structure prediction shed light on how helpful deep learning methods can be in computational structural biology. As an initial step towards high quality in silico assessment measures, we present a Recurrent Relational Network (RRN) for learning sequence-structure relationship. A RRN is a type of neural network embedded on a graph structure. Indeed, a protein can be represented as a graphical model in which vertices are residues and edges symbolize contacts between pairs of residues. Therefore, RRN offers an attractive framework were vertices are used to learn internal representations of residues. Interacting residues communicate in the RRN by exchanging messages, which are exploited to iteratively update their internal representation. We evaluate the ability of our network to match compatible protein sequences and structures. Preliminary results show that the network is able to distinguish natural from random sequences for a given structure as well as natural from random structures for a given sequence.  
9 Mehran  Ghafari  University of Tennessee at Chattanooga  Big Data Imaging Genomics Poster only Application Note: μPolar - An Interactive 2D Visualization Tool for Microscopic Time-Series Images Mehran Ghafari, Daniel Mailman, Hong Qin  Time-lapse microscopy is an effective research tool to monitor cell behavior and cell divisions. Recent advances in microfluidics have accelerated the adoption of time-lapse microscopy in research. However, it is challenging to visualize and interpret the time-series data gathered through time-lapse microscopy. We have developed a circular plotting software tool, μPolar, to visualize the trends and patterns of the cell movements and cell division events in a time-series. μPolar is interactive and easy to use. We demonstrate the utility of μPolar by visualizing the events of dividing yeast cells where cell divisions lead to oscillating plotting patterns, and in migrating mouse fibroblasts where cell shapes change during the migration. μPolar potentially could be applied to other types of time-series of

microscopic images.
https://f1000research.com/author/poster/thankyou/1118884
10 John Van Horn University of Virginia Big Data Imaging Genomics Poster only Phenoneurogenomic Decomposition of Diagnosis and Sex in Autism Spectrum Disorder John D. Van Horn, Siva Venkadesh, Zachary J. Jacokes, Ian Adoremos, and Kevin N. Pelphrey, for the Autism Centers of Excellence (ACE) GENDAAR Research Consortium

Autism Spectrum Disorder (ASD) represents a collection of developmental conditions whose outward behavioral manifestation defies a single neurological or genetic source. ASD presents a diagnostic ratio of 5:1 boys vs. girls, is associated with altered brain connectivity, and linked to ~1,000 genetic loci and ~400 genes. This study, as part of the National Institutes of Health (NIH) Autism Center of Excellence (ACE) Program, explored each of these domains and their interactions via comprehensive phenomic assessment, structural and connectomic human neuroimaging, and DNA sequencing to identify the sources of sex-specificity in ASD vs. Typically Developing (TD) children. A formal neuropsychological battery was administered to each subject (N=226), including sub-scales from: the DAS, Vineland, SRS, CELF, RBSR, SCQ, BRIEF, and CBCL.  T1, EPI, and DWI neuroimaging were acquired on a Siemens 3T Magnetom TrioTim or PRISMA scanners via standardized acquisition protocols. Image preprocessing was performed using FSL, FreeSurfer 7.0, and fit to the atlas of Destrieux et al. Connectivity metrics were computed using DSI Studio and NetworkX. Connectograms, describing average morphometry and connectivity in each group, were computed. DNA sequencing (Illumina Infinium Omni2.5-8 Kit) was performed and compared against the 1,000 Genomes reference.  Chromosomes were windowed into 10,000 segments, used to compute mutation densities, and pooled by group where sex-specific differences were statistically identified. Multivariate phenomic diagnosis-by-sex interaction, driven largely by assessment subscales of executive function and self-control, suggests that these cognitive domains may be particularly sensitive to sex-differences in ASD unlike in TD children. No volumetric differences in cortical morphometry were observed. Differences in network theoretical metrics of functional connectivity, however, were present and necessitate further exploration. Gene loci previously linked to ASD, show variable mutation rates by sex, suggestive of differential genetic influences. These results supported the contention that ASD is particularly sexual dimorphic which may hinder effective diagnosis.   https://f1000research.com/posters/10-1248
11 Yogasudha Veturi University of Pennsylvania Big Data Imaging Genomics Poster only A framework to integrate neuroimaging data with genomics and EHR to identify pleiotropy Yogasudha Veturi, Anastasia Lucas, Nadia Penrod, Christos Davatzikos, and Marylyn D. Ritchie Several studies have identified genetic overlaps between neuroimaging data and complex human diseases (e.g. Alzheimer’s disease, type II diabetes, stroke, etc.). A comprehensive investigation of functional associations connecting neuroimaging phenotypes to diseases in electronic health records (EHR) across the phenome can yield novel image-derived phenotype (IDP) biomarkers for complex diseases as well as provide pleiotropic gene targets for drug repurposing. We first conducted a phenome-wide imaging study (PheWIS) in UK Biobank to identify disease-IDP associations (n=40,201 sa/mples) between 691 International Classification of Disease (ICD) codes and 870 structural and diffusion IDPs. Next, we conducted a phenome-wide association study (PheWAS) across 691 ICD codes (n=452,595 samples) on 52,570 Bonferroni-significant genetic variants (p-value<1.9E-12) for 3,935 IDPs from the Oxford Brain Imaging Genetics web server; these were also eQTL for 5,523 genes in GTEx v8. PheWIS yielded several expected associations between hypertension, non-insulin dependent diabetes mellitus, cerebral infarctions and clusters of structural and diffusion MRI IDPs that are associated with extensive brain atrophy as well as white matter integrity; e.g. p-value for hypertension and mean L3 in external capsule on fractional anisotropy skeleton=4.62E-69. Interestingly, we also found associations between anemia and T2star MRI, which relates to iron concentration; e.g. p-value for anemia and median T2star in caudate (left)=1.64E-13 and also for substance abuse disorders, e.g. p-value for mental and behavioral disorders due to tobacco use and volume of grey matter in cerebellum=1.09E-10. IDP-guided PheWAS yielded 5,888 Bonferroni-significant eQTL (p-value<1.37E-09) for 47 diseases, mapping to 718 eGenes from GTEx v8. Aside from proof-of-concept associations (e.g. rs34404554/TOMM40: Alzheimer’s disease and angina pectoris), we also found pleiotropic associations between IDPs and diseases across the phenome including respiratory (e.g. ORMDL3: asthma), digestive (e.g. GBAP1: diseases of stomach/duodenum), musculoskeletal (e.g. PSMB9: rheumatoid arthritis), eye and adnexa (e.g. CDKN2B: glaucoma), and genitourinary (e.g. SP1: hydrocele and spermatocele) systems.   
12 Yun Hao University of Pennsylvania Human Intrigue: Meta Analysis Approaches for Big Questions with Big Data Accepted proceedings paper with oral presentation Improving QSAR Modeling for Predictive Toxicology using Publicly Aggregated Semantic Graph Data and Graph Neural Networks Joseph D. Romano, Yun Hao, Jason H. Moore Quantitative Structure-Activity Relationship (QSAR) modeling is a common computational technique for predicting chemical toxicity, but a lack of new methodological innovations has impeded QSAR performance on many tasks. We show that contemporary QSAR modeling for predictive toxicology can be substantially improved by incorporating semantic graph data aggregated from open-access public databases, and analyzing those data in the context of graph neural networks (GNNs). Furthermore, we introspect the GNNs to demonstrate how they can lead to more interpretable applications of QSAR, and use ablation analysis to explore the contribution of different data elements to the final models’ performance. https://github.com/EpistasisLab/qsar-gnn/blob/master/poster/PSB2022_poster.pdf
13 Pei-Chen Peng Cedars-Sinai Medical Center Human Intrigue: Meta Analysis Approaches for Big Questions with Big Data Poster only Identifying non-coding drivers of ovarian cancer by converging germline variants and somatic mutations Pei-Chen Peng, Jonathan Tyrer, Brian Davis, Stephanie Chen, Felipe S. Dezem, Siddhartha Kar, Jasmine Plummer, Ovarian Cancer Association Consortium, Alexander Gusev, Simon Knott, Matthew L. Freedman, Paul Pharaoh, Kate Lawrenson, Simon A. Gayther, Michelle R. Jones Cancer risk is conferred by germline variants, while initiation and development result from somatic mutation. The heritable risk for ovarian cancer is polygenic. In most cases many alleles each confer a small amount of risk, while others inherit moderate or highly penetrant variants that are usually in the coding genome.  Somatic driver mutations are critical in disease initiation, inactivating tumor suppressor genes or activating oncogenes. It has been more difficult to identify driver mutations in the non-coding genome, as the function of non-coding sequence is not as well understood. Advances in whole genome sequencing (WGS) have enabled functional characterization of non-coding germline and somatic variants. We hypothesize that a proportion of germline risk variants and somatic mutations co-locate within regulatory elements (REs) that disrupt enhancer activity, transcription factor binding and regulate the expression of target genes. We integrated germline variants from genome-wide association studies (GWAS) of ovarian cancer (up to 26,151 cases, 105,724 controls) and somatic mutations from WGS of ovarian tumors (n > 200) with ovarian cancer related epigenomic and transcriptomic data, to identify co-localized REs that drive disease pathogenesis.  We observed significant enrichment of germline variants in ovarian cancer active promoters (p = 0.009). Somatic mutations mostly occurred in FANTOM5 enhancers (1003/43099 mutated, 2.33%) and ovarian cancer active regions (18/750 mutated, 2.4%). We identified about 50 REs that harbor disease-associated germline variants and are significantly mutated by somatic mutations. These co-localized REs show mutations enriched in developmental transcription factor binding sites, such as the SOX family, FOXO family, and DUX. We further examined target gene expression for co-localized REs and identified 11 genes differentially expressed between mutant and reference allele. Finally, pathway analysis mapped the co-localized REs to pathways involved in TERT activation in cancer (p = 0.01) and regulation of response to DNA damage stimulus (p=0.03).   
14 Deborah Plana Harvard Medical School Human Intrigue: Meta Analysis Approaches for Big Questions with Big Data Poster only Cancer patient survival can be accurately parameterized, improving trial precision and revealing time-dependent therapeutic effects  Deborah Plana, Geoffrey Fell, Brian M. Alexander, Adam C. Palmer, Peter K. Sorger



Individual participant data (IPD) from oncology clinical trials represent an invaluable source of information for identifying factors that influence trial success and failure, improving trial design and interpretation, and comparing pre-clinical studies to clinical outcomes. However, the IPD used to generate published survival curves are not generally available. We imputed survival IPD from ~500 arms of Phase 3 oncology trials (representing ~220,000 events) and found that they are well fit by a two-parameter Weibull distribution. This finding supports the use of parametric statistics to increase trial precision with small patient cohorts typical of Phase 1 and 2 trials. We also show that frequent violations of the proportional hazards assumption, particularly in Phase 3 trials of immune checkpoint inhibitors, arise from time-dependent therapeutic effects. Trial duration therefore has an underappreciated impact on the likelihood of success. All imputed IPD and analysis are available as supplementary materials and via the website https://cancertrials.io./ https://f1000research.com/posters/10-1232
15 Van Truong University of Pennsylvania Human Intrigue: Meta Analysis Approaches for Big Questions with Big Data Poster only Utility of Biomedical Knowledge Base Integration for Advancing Immune Health Research Van Q. Truong, Joseph D. Romano, Scott M. Dudek, Allison R. Greenplate, E. John Wherry, Marylyn D. Ritchie The human immune system is composed of many cell types and molecular components, with vast variation across individuals. By nature, biological information is extremely complex and difficult to visualize in its entirety. Despite decades of advancements, methods development and applications of artificial intelligence (AI) in immune health research remain rudimentary. Thus, biomedical informatics is well-suited to unify diverse knowledge types spanning the genome, transcriptome, proteome, biological pathways, and disease association in a machine-readable manner. To bridge this gap, we unite open access biomedical databases into a central knowledge base (KB) and discuss the development of methods capable of utilizing the underlying network structure to direct predictive modeling for the dissection of complex immunological diseases.

Here, we employ the Neo4j graph database platform and Cypher query language, which are well-suited for storing and analyzing semantically meaningful relationships within biological and immunological information networks. First, we integrated several sources of curated biological knowledge from Hetionet, ComptoxAI, Reactome, Pathway Commons, and Library of Knowledge Integration (LOKI), and incorporated additional immunology-related entities: ImmGen, ImmunoGlobe, and immune-associated diseases queried from DisGeNet. In the graph, the nodes correspond to individual entities in the source databases while edges represent the mapped relationships between nodal entities. We are continually developing the knowledge infrastructure to provide additional data-types and expanding its utility and functionality. Next, we aim to develop graph data science approaches for knowledge discovery such as heterogeneous graph machine learning (ML) models for link prediction. We will also explore regularization and feature selection strategies to improve ML applications on highly connected, complex information.  Coupled together, we believe our work will provide a path forward to explore data beyond single data-types and embrace a meta-dimensional framework for modeling strategies and applications in Immune Health.

 
16 Shuangxia Ren University of Pittsburgh Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Accepted proceedings paper with oral presentation De novo Prediction of Cell-Drug Sensitivities Using Deep Learning-based Graph Regularized Matrix Factorization Shuangxia Ren, Yifeng Tao, Ke Yu, Yifan Xue, Russell Schwartz, Xinghua Lu Application of artificial intelligence (AI) in precision oncology typically involves predict- ing whether the cancer cells of a patient (previously unseen by AI models) will respond to any of a set of existing anticancer drugs, based on responses of previous training cell samples to those drugs. To expand the repertoire of anticancer drugs, AI has also been used to repurpose drugs that have not been tested in an anticancer setting, i.e., predict- ing the anticancer effects of a new drug on previously unseen cancer cells de novo. Here, we report a computational model that addresses both of the above tasks in a unified AI framework. Our model, referred to as deep learning-based graph regularized matrix factor- ization (DeepGRMF), integrates neural networks, graph models, and matrix-factorization techniques to utilize diverse information from drug chemical structures, their impact on cellular signaling systems, and cancer cell cellular states to predict cell response to drugs. DeepGRMF learns embeddings of drugs so that drugs sharing similar structures and mech- anisms of action (MOAs) are closely related in the embedding space. Similarly, DeepGRMF also learns representation embeddings of cells such that cells sharing similar cellular states and drug responses are closely related. Evaluation of DeepGRMF and competing models on Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets show its superiority in prediction performance. Finally, we show that the model is capable of predicting effectiveness of a chemotherapy regimen on patient outcomes for the lung cancer patients in The Cancer Genome Atlas (TCGA) dataset. https://github.com/renshuangxia/DeepGRMF/blob/main/drug_sensitivity_prediction_psb_poster.pdf
17 Javier Blanco Portillo Stanford University Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only Founder Populations in Precision Medicine: The Hawaiians Javier Blanco-Portillo, Mark Penhueli, Feiyang Liu, Charleston Chiang, Patrick Kirch,

Christopher R. Gignoux , Marcus Feldman, Keolu Fox, Genevieve Wojcik, Alexander Ioannidis

Available evidence confirms that the first inhabitants of Hawai’i trace their origins to the Austronesian-speaking founders of Eastern Polynesia. However, the precise identification of the proximate origin and timing of the settlement of Hawai’i are still matters of debate. Using ancestry-specific approaches to explore the genomic variation of modern Hawaiians and individuals from other Polynesian islands, we find evidence of a Tuamotuan origin for the inhabitants of Hawai’i in the 12th century, in contrast with previous theories that posited an origin in the Marquesas or Society Islands. We characterize the founder effects and population bottlenecks that resulted from this migratory process, and describe how they are vital for the development of precision medicine in modern Hawaiians. https://www.dropbox.com/s/bqh7nxn4hpyplim/hawaii_poster_horizontal_precisionMedicine.pdf?dl=0
18 Francisco De La Vega Tempus Labs, Inc. Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only Ancestry inference from targeted NGS tests to enable precision medicine and improve racial/ethnic representation in clinical trials Francisco M. De La Vega, Brooke Rhead, and Sean A. Irvine There are well-established racial and ethnic disparities in cancer incidence and outcomes, in part due to structural, socioeconomic, environmental, and behavioral factors. However, some of these differences can be attributed to biological factors, such as the frequency of cancer mutations that vary by ancestry. Further, race and ethnicity reporting in clinical trials occurs infrequently, and it is missing in up to 50% of patient medical records and genomic profiling tests. Representative participation in clinical trials would help minimize disparities in outcomes and enable the assessment of biological differences that may determine differential efficacy of drugs for oncology and other indications. Thus, the ascertainment of diversity in real-world genetic testing and clinical trial cohorts is needed. Rather than relying on self-reported race/ethnicity labels, ancestry can be inferred directly from sequencing data collected during tumor profiling and other tests. Ancestry is usually inferred from genome-wide data, either array or whole-genome sequencing (WGS), using unlinked random markers and clustering methods; but this approach is inappropriate for targeted next-generation sequencing (NGS) gene panels or even whole-exome sequencing (WES) data. Instead, we selected 654 ancestry informative markers (AIMs) overlapping the coding regions of 648 cancer genes targeted by the Tempus xT NGS assay. We implemented an ancestry inference algorithm using the selected AIMs to infer global ancestry proportions at the continental level (i.e., Africa, America, Europe, East Asia, and South Asia). We validated our methods by comparing our results with data from the 1,000 Genomes Project and local ancestry inference previously derived in the ICGC PCAWG project. We show that this method can infer ancestry and admixture proportions from targeted NGS testing data where race/ethnicity was missing, and report concordance with self-described labels where available. Our results suggest that inferred ancestry can facilitate research on ancestry correlates with cancer mutations and outcomes data.  https://f1000research.com/posters/10-1118
19 Guoqian Jiang Mayo Clinic Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only Building Clinical Knowledge Graphs in FHIR RDF for Explainable AI Applications in Healthcare Guoqian Jiang, Emily Pfaff, Christopher G. Chute, Guohui Xiao HL7 Fast Healthcare Interoperability Resources (FHIR) is rapidly becoming the standards framework for the exchange of electronic health record (EHR) data. FHIR Resource Description Framework (RDF) has become the first main-stream clinical data standard to incorporate the Semantic Web vision. The combination of FHIR, knowledge graphs and the Semantic Web enables a new paradigm to build classification and explainable artificial intelligence (AI) applications in healthcare. However, there is a critical need to build the FHIR-based data access and query on existing relational data sources to generate clinical knowledge graphs (CKGs) for facilitating standards-based semantic data integration, sharing and discovery in broader scientific research communities. The objective of the study is to develop and evaluate the methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical knowledge graphs in FHIR RDF. We developed a FHIR-Ontop-OMOP system to generate virtual clinical knowledge graphs against the OMOP relational databases. The system consists of the following modules: 1) an input module including FHIR model ontology, OMOP CDM-based data repository, and OMOP-FHIR mappings represented by a mapping template; 2) a CKG generation module. We used an ontology-based data access tool known as Ontop as an engine; 3) a SPARQL endpoint and validation module; 4) a Semantic Web and AI application module. We evaluate the system in terms of the portability of the system across different institutions, the faithfulness of data transformation, and the conformance of generated RDF graphs to the FHIR RDF specification. This study is supported in part by the NIH FHIRCat R01 grant (R01 EB030529). https://github.com/fhircat/FHIRCat/raw/master/presentations/PSB-2022-Abstract-v1.0.pdf
20 Kord Kober University of California San Francisco Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only A Data-Integrated Multi-Omics Approach to Study  Cancer Related Fatigue Kord M. Kober Cancer-related fatigue (CRF) is a highly prevalent symptom reported that occurs during and persists following cancer treatment. CRF has significant negative outcomes on patients’ mood and ability to function. In a sample of oncology patients (n=1343) undergoing chemotherapy (CTX), the first aim of our study is to evaluate underlying mechanisms for CRF, by comparing patients with high versus low levels of CRF in terms of gene expression, genetic, and epigenetic changes (i.e., “omics”). Our second aim is to develop a risk prediction model that could assist with the identification of patients at greatest risk for high levels of CRF. Here we report on the current findings of this study. Perturbed KEGG signaling pathways are associated with evening fatigue (e.g., antigen processing and presentation and cytokine-cytokine receptor interaction) and morning fatigue (e.g., oxytocin, serotonergic synapse). This is the first study to identify perturbations in neuroendocrine and inflammatory pathways that are associated with morning fatigue among patients receiving CTX. In an exploratory analysis of a subset of patients diagnosed with breast cancer (n=116), morning fatigue group membership was associated with methylation of one eCpG cg01802117 for SCP2. Prediction models of fatigue severity one week following CTX were created using multivariable linear regression, RPART, RF, support vector machine, LASSO, and ElasticNet. For evening fatigue, two of the 13 individual LFS items (i.e., “worn out”, “exhausted”) were the strongest predictors. For morning fatigue, Karnofsky Performance Status score, two LFS items (i.e., “worn out”, “exhausted”), quality of sleep, and excessive daytime sleepiness were the strongest predictors. This finding suggests that clinicians can ask patients to rate their level of feeling “worn out” or “exhausted” prior to CTX to estimate their evening fatigue in the week following chemotherapy. The use of these descriptive single items may facilitate the assessment of fatigue in a busy oncology clinic. https://f1000research.com/posters/10-1249
21 Sarbesh Pandeya Beth Israel Deaconess Medical Center, Harvard Medical School Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only Using machine learning algorithms to enhance the diagnostic performance of electrical impedance myography and eletromyography Seward B. Rutkove Electromyography (EMG) is a technique that measures the electrical activity of a muscle in response to a nerve’s stimulation. Electrical impedance myography (EIM) is a non-invasive technique using high-frequency, low-intensity electrical current to evaluate muscle structure and composition.  However, unlike EMG, EIM does not focus on measuring the inherent electrical movement of the muscle tissues. But both procedures are used to detect muscle abnormalities. These are also applied in tracking the progression of muscle disorders and for evaluating the treatment efficacy. However, these methods have been rarely used in classifying disease patterns. In fact, EMG alone, EIM alone and EMG-EIM combination provide a deep level of data on muscle activity that possesses a significant potential in separating neuromuscular conditions. We, therefore, assess the classification performance of these three groups via machine learning (ML) techniques. For this study, EIM and EMG data were obtained from two different groups of neuromuscular disease-infected animals, including Duchenne muscular dystrophy model (the D2-mdx mouse), and a model of fat-related atrophy (the db/db diabetic obese mouse). We also included a single group of wild-type (WT) animals. The machine learning models were evaluated based on their receiver-operating characteristic (ROC) curves and measurement of the area under the curve (AUC). The results showed that combination datasets possess a higher capability to detect and classify a given neuromuscular condition. In conclusion, consideration must be given in combining EMG and EIM data for disease assessment. The study also outlines that machine learning is a great tool to improve the precision of diagnosis and should be considered in future clinical and diagnostic applications of EIM and EMG.    
22 Diogo Ribeiro University of Lausanne Precision Medicine: Using Artificial Intelligence to Improve Diagnostics and Healthcare Poster only Discovery of local gene co-expression regulation through single cell analysis Diogo Ribeiro, Chaymae Ziyani, Olivier Delaneau

Nearby genes are often expressed as a group. This local gene co-expression is more pronounced in the immediate vicinity of a gene (e.g. <100 kb) but it has also been shown to extend further and occur regardless of strand, transcriptional orientation and shared functionality. Recent studies highlight the existence of regulatory domains (e.g. groups of enhancers) orchestrating the organised expression of nearby genes (Delaneau et al. 2019 Science). By leveraging gene expression measurements from the GTEx project across 49 human tissues and hundreds of individuals, we have previously found local gene co-expression to be highly prevalent (Ribeiro et al. 2021 Nat. Comm.), occurring in 13% to 53% of genes per tissue.

Here, to understand how the observed local gene co-expression and its regulation manifests at the single-cell level, we analysed a public dataset of single-cell RNA-seq across 87 genotyped individuals in a specific cell type (iPSC, Cuomo et al. 2020 Nat. Comm.). By taking advantage of co-expression measurements across cells per individual, we (i) confirm the widespread local co-expression of thousands of gene pairs at the single cell level, (ii) compare single cell to bulk RNA-seq in identifying local gene co-expression and (iii) identify enhancers involved in local gene co-expression by analysing multimodal single cell data (RNA-seq+ATAC-seq performed on the same cells). In addition, we show that local co-expression between two genes involves (i) concomitant co-transcription (GRO-seq data) and (ii) ultimately results in correlated protein levels (mass spectrometry data), evidencing the importance and potential functional link of locally co-expressed genes.

Our dissection of the genetic architecture of local gene co-expression through multiple technologies (RNA-seq, GRO-seq, ATAC-seq, proteomics) allows us to propose a model in which the co-expression of nearby genes is prevalent and largely due to the sharing of regulatory elements such as enhancers.

https://diogomribeiro.github.io/single_cell_poster.pdf
23 Søren Brunak University of Copenhagen General Poster only POPULATION-WIDE ANALYSIS OF PRESCRIPTION TRAJECTORIES: 7.2 MILLION DANISH PATIENTS OVER 25 YEARS WITH 1.1 BILLION REDEEMED PRESCRIPTIONS Alejandro Aguayo-Orozco, Amalie Dahl Haue, Isabella Friis Jørgensen, David Westergaard, Pope Lloyd Moseley,

Laust Hvas Mortensen and Søren Brunak
It is unknown how sequential drug patterns convey information on a patient’s health status and treatment guidelines rarely account for this. Drug-agnostic longitudinal analyses of prescription trajectories in a population-wide setting are needed. In this cohort study, we used 24 years of data (1.1 billion prescriptions) from the Danish prescription registry to model the risk of sequentially redeeming a drug after another. Drug pairs were used to build multistep longitudinal prescription trajectories. These were subsequently used to stratify patients and calculate survival hazard ratios between the stratified groups. The similarity between prescription histories was used to determine individuals’ best treatment option. Over the course of 122 million person-years of observation, we identified 9 million common prescription trajectories and demonstrated their predictive power using hypertension as a case. Prescription trajectories can provide novel insights into how individuals’ drug use change over time, identify suboptimal or futile prescriptions and suggest initial treatments different from first line therapies. Observations of this kind may also be important when updating treatment guidelines. https://doi.org/10.7490/f1000research.1118885.1
24 Sakhaa Alsaedi King Abdullah University of Science and Technology (KAUST) General Poster only Automated genetic-based medical diagnostic system for treatment of infectious diseases using  causal deep learning Sakhaa Alsaedi,  Katsuhiko Mineta, Xin Gao, Takashi Gojobori In infectious diseases, molecular diagnostics are revolutionizing clinical practice by helping doctors understand a patient's cases caused by infection before symptoms and complications.  Moreover, using machine learning algorithms to assist doctors in clinical decision-making and diagnosis is critical for patient treatment decisions and outcomes. However, current automated diagnosis systems only utilize associative deep learning methods that identify diseases strongly correlated with a patient's symptoms without considering the genetic risk factors that may cause complications. Alternatively, they could be related to other complex disorders affecting the patient's situation. In that case, understanding how different viral strains affect individual patients and, in particular, how they interact with different human host cells and immune responses is a fundamental step in order to formulate accurate treatment plans. Since the outbreak of the COVID-19 disease, host genetic variations play a significant role in the manifestation of different degrees of severity of illness among different individuals. It is crucial to use this disease as the first case study in our research. Thus, we develop a deep learning model that provides automated medical plans and predicts the severity score as well as multi-organs dysfunction scores during infection by integrating genetic data with metadata and analyzing risk factors. Our preliminary result shows that our model performs better than state-of-the-art on synthetic data. The data was generated based on descriptive information that explained the severity of COVID-19 patients from scientific articles and medical reports. In addition, we test models on actual medical records of the sensitivity of obtaining medical reports. The predicted scores assist doctors in having a better understanding of the COVID-19 cases and provide an accurate treatment plan that could eventually reduce the severity and complication of infectious diseases.   
25 Mickayla Bacorn University of Maryland, Baltimore County General Poster only Bioinformatic Analysis of the Active Site Diversity for SARS-CoV-2 and Other Coronaviruses MaryAgnes Balogun, Cassandra Olivas, Amy Wu Wu, Joseph H Lubin, Stephen K Burley, Sagar D Khare, Christine Zardecki Coronaviruses have been a source of significant risk to global health. Over five million lives have been lost to SARS-CoV-2 (COVID-19). Major efforts are ongoing to mitigate the current pandemic and future outbreaks by designing broad-spectrum drugs to target coronaviruses across species. We identified the positions of residues that participate in binding by using the 3D visualization tool Mol* (https://molstar.org/) to view known inhibitors interacting with the active site of SARS-CoV-2 proteases, two essential enzymes to virus maturation. Experimental structures from the Protein Data Bank (PDB) were used where available and additional models were generated using Robetta (https://robetta.bakerlab.org/). Sequences for additional coronaviridae proteases were obtained from NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi). A sequence-based comparison was performed using Clustal (https://www.ebi.ac.uk/Tools/msa/clustalo/) and a structure-based comparison was performed using Dali (http://ekhidna2.biocenter.helsinki.fi/dali/), both using SARS-CoV-2 as the template. The preliminary results show some positions with mutations that remain within the same amino acid classification, groupings by structural, chemical, and functional similarity. Other mutations result in a different classification in that position that may have a significant impact on the active site structure and binding. We will continue to analyze these changes to identify patterns in the mutations and to identify which mutations have the most impact and are most relevant to potential viral evasion of broad-spectrum drugs. https://f1000research.com/posters/10-1212
26 Cosmin Bejan Vanderbilt University General Poster only DrugWAS: Drug-wide association studies for COVID-19 drug repurposing Cosmin A. Bejan, Katherine N. Cahill, Patrick J. Staso, Leena Choi, Josh F. Peterson, Elizabeth J. Phillips This study aimed to systematically investigate if any of the available drugs in Electronic Health Record (EHR) can be repurposed as potential treatment for COVID-19. Based on a retrospective cohort analysis of EHR data, drug-wide association studies (DrugWAS) were performed on COVID-19 patients at Vanderbilt University Medical Center (VUMC). For each drug study, multivariable logistic regression with overlap weighting using propensity score was applied to estimate the effect of drug exposure on COVID-19 disease outcomes. Patient exposure to a drug between 3-months prior to the pandemic and COVID-19 diagnosis was chosen as exposure of interest. All-cause of death was selected as primary outcome. Hospitalization, admission to the intensive care unit (ICU), and need for mechanical ventilation were identified as secondary outcomes. From the 9,748 COVID-19 patients included in the study, 667 (6.84%) were hospitalized, 105 (1.08%) were admitted to the ICU, 84 (0.86%) received mechanical ventilation, and 138 (1.42%) died. The mean age was 42 and most patients were females (60.2%), White (84.2%), and non-Hispanic or Latino (96.5%). Overall, 17 drugs were significantly associated with decreased COVID-19 severity. Previous exposure to two types of 13-valent pneumococcal conjugate vaccines, PCV13, (OR, 0.31; 95% CI, 0.12-0.81 and OR, 0.33; 95% CI, 0.15-0.73), diphtheria toxoid and tetanus toxoid vaccine (OR, 0.38; 95% CI, 0.15-0.93) were significantly associated with a decreased risk of death (primary outcome). Secondary analyses identified several other significant associations showing lower risk for COVID-19 outcomes: acellular pertussis vaccine, 23-valent pneumococcal polysaccharide vaccine (PPSV23), flaxseed extract, ethinyl estradiol, estradiol, turmeric extract, ubidecarenone, azelastine, pseudoephedrine, dextromethorphan, omega-3 fatty acids, fluticasone, and ibuprofen. In conclusion, this cohort study leveraged EHR data to identify a list of drugs that could be repurposed to improve COVID-19 outcomes. Further randomized clinical trials are needed to investigate the efficacy of the proposed drugs. http://adi.bejan.ro/papers/2022_PSB_Bejan_etal_DrugWAS.pdf
27 Carly Bobak Dartmouth College General Poster only Genetic Sequencing Algorithms Allow for the Discovery of Directed Patterns in Patient Sharing Pathways Carly Bobak, James O'Malley The physician-patient relationship may play an important role in patient health outcomes. Physicians do not act as isolated actors, but collectively as groups wherein each physician contributes to the patient’s overall outcome. Patient sharing networks, where bipartite networks of patients linked to physicians are projected to unipartite networks where physicians are connected by edges symbolizing the occurrence of shared patients, allow researchers to better study the way the physician ecosystem influences health outcomes. Such networks can be constructed from insurance claims such as in Medicare data, or from electronic medical records, etc.



Traditionally, these patient sharing networks have been considered undirected, where an edge exists if two physicians share a patient and doesn’t otherwise. Recently, researchers have proposed using directed patient sharing networks, where the referral pathways between physicians are summarized using more sophisticated functions to retain more information about the referral paths when forming the network. Considering more general referral paths, we can ask questions beyond two physicians into how referral paths between groups of physicians are associated with patient outcomes, and if revisits to previously seen physicians appear associated with outcomes. Furthermore, we can compare outcomes between patients who have a revisit to Physician A between specialty visits (ABACA) as opposed to patients who directly travel through a pathway of physicians (ABC).



In this exploratory work, we propose using genetic sequencing algorithms to identify encoded motifs in physician referral pathways. We encode physician paths where the first physician seen is labeled ‘A’, the second ‘B’, and so on. We count and define the distribution of such motifs, explore the possibility of using these motifs to cluster similar patients, and consider the implications of these results in the way we describe and analyze directed patient sharing networks.

 

https://f1000research.com/author/poster/preview/1118892
28 William Bone University of Pennsylvania General Poster only ColocQuiaL: A QTL-GWAS colocalization pipeline William P. Bone, Brian Y. Chen, Kimberly Lorenz, Michael Levin, Marylyn D. Ritchie, Benjamin F. Voight Identifying genomic features responsible for genome-wide association study (GWAS) signals has proven to be a difficult challenge. One source of data that can be used to link GWAS associations to a predicted gene of action is by connecting them with molecular phenotype quantitative trait loci (QTLs), such as those associated with variation in expression of transcripts (eQTLs) and proportion of alternatively spliced transcripts (sQTLs). To provide a common, reproducible framework to perform colocalization analyses between QTLs and complex trait data at moderate computational scale, we present ColocQuiaL. The ColocQuiaL pipeline provides a framework to perform GWAS-QTL colocalization analyses across the genome with the eQTL and sQTL data collected by the Genotype-Tissue Expression (GTEx) project across >40 tissues. ColocQuiaL returns summary files with the results of all the colocalization analyses it performs as well as locus visualization plots to allow for detailed review of the results.

As an example, we used ColocQuiaL to perform colocalization between the latest type 2 diabetes (T2D) GWAS data and GTEx v8 single-tissue eQTL/sQTL data. In total, we performed 48,557 colocalizations between T2D signals and eQTL/sQTL signals. Our results show these T2D GWAS signals colocalize with QTL signals for many of the genes one would expect, including three maturity-onset diabetes of the young gene QTLs and many of the genes reported in recent colocalization experiments between T2D GWAS and islets of Langerhans eQTLs as well as T2D GWAS and eQTL/sQTL signals from a subset of GTEx tissues

In summary, ColocQuiaL streamlines the running of post-GWAS colocalization analyses and returns all of the results a user needs to assess the quality of the results and identify which GTEx QTL signals are candidates for follow up analyses.

 
29 Li Shen University of Pennsylvania General Poster only Identifying Drug Interaction Effects on Myopathy at the ATC Group Level Jackson Dooley, Brian Lee, Augustin Liu, Lei Wang, Xia Ning, Lang Li, Li Shen Background: We previously studied a novel pharmacovigilance problem for mining directional adverse drug interaction effect (ADE) on myopathy using the FDA Adverse Event Reporting System (FAERS) database. Given over 1,500 FDA approved drugs, the number of candidate directional ADE effects between two drug combinations could be prohibitively huge. To overcome this limitation, we propose to perform high order directional ADE effect analysis at the ATC group level.



Method: We employed the 3rd-level ATC codes to define drug groups using their shared therapeutic properties. Each drug group was re-annotated as a new drug and treated just like individual drugs. The analysis was applied to the FAERS data, which were de-duplicated and standardized using the AEOLUS pipeline. Using the frequent itemset mining, we analyzed frequent ATC combinations in FAERS, and estimated the odds ratio (OR) of the directional myopathy risk for adding new medication to existing one.



Results: Selected combinations of antecedent and consequents for ATC code combinations were analyzed and top odds ratio findings were reported. The top finding S01H, S01A → S01H, S01A, A02B is a two ATC code combination of local anesthetics and combinations with antibiotics which when a drug in the A02B group representing drugs for Peptic Ulcer and Gastro Osteophageal Reflux disease is added produces a drug-drug interaction that leads to a substantial increase in the risk of myopathy OR = 43.093.



Conclusion: We proposed an innovative method to perform high order directional ADE analysis at the group level, where the ATC codes were employed to define drug groups. Our results indicate that ATC level groupings allow for targeted analysis on interesting drug subgroups, offering a new lens to interpret DDIs that could potentially lead to the discovery of previously unknown relationships.

http://lishenlab.com/posters/PSB2022-ATC-DDI-poster.pdf
30 Theodore Drivas Perelman School of Medicine at the University of Pennsylvania General Poster only Widespread somatic mosaicism in a population-level biobank Theodore G. Drivas, Tomoki T. Nomakuchi, Zoe C. Bogus, Maria Bonanni, Anna Raper, Staci Kallish, Katherine L. Nathanson MD, Marylyn D. Ritchie Neurofibromatosis type 1 (NF1), caused by pathogenic variants in the NF1 gene, is a relatively common autosomal dominant genetic disorder classically characterized by complete penetrance. We have recently been referred four patients with incidentally discovered pathogenic NF1 variants, but with no features of the syndrome on exam or history. We hypothesized that the true population-level incidence of NF1 pathogenic variants might be higher than reported, with reduced penetrance or a higher incidence of somatic mosaicism than is currently known.



We investigated this hypothesis in the Penn Medicine Biobank (PMBB), which contains EHR and WES data on 43,731 individuals. We identified 51 individuals with pathogenic NF1 pLOF variants, equating to an incidence of 1 in 850, 3 times higher than commonly accepted estimates. Only 20 of these 51 individuals had an NF1 diagnosis in their medical record. The 31 NF1 pLOF carriers without an NF1 diagnosis were found to have significantly lower variant allele frequencies for their NF1 variant, suggesting that, in these individuals, these NF1 variants might exist in a somatic mosaic state. PheWAS analysis demonstrated that numerous phenotypes were significantly associated with the presence of an NF1 pLOF variant in PMBB, but these associations were entirely driven by the 20 patients with a known NF1 diagnosis.



Our experience with NF1 led us to examine the incidence of somatic mosaicism on a larger, population-level scale. Within PMBB, there is clear evidence that nearly all individuals harbor multiple likely somatic-mosaic variants in various genes, with certain genes being significantly enriched for somatic mosaic variants, at least in peripheral blood. These genes are, in general, associated with cell proliferation/survival. This identification of widespread somatic mosaicism has major implications for future genetic testing and biobanking efforts, and the further investigation of this finding will be critical for accurate counseling of patients and families.

 
31 Karl Keat University of Pennsylvania General Poster only Surveying the Frequency and Impact of Pharmacogenetic Variants in an ancestrally Diverse Biobank Population Karl Keat, Binglan Li, Glenda Hoffecker, Marjorie Risman, Scott Dudek, Katrin Sangkuhl, Anurag Verma, Michelle Whirl-Carrillo, Ryan Whaley, Mark Woon, Teri E. Klein, Marylyn D. Ritchie, Shefali Verma, Sony Tuteja Although the Clinical Pharmacogenomics Implementation Consortium has thus far published 26 actionable guidelines for drug-gene interactions, their usage in the clinic has remained low. Using the Pharmacogenomics Clinical Annotation Tool (PharmCAT), we quantified the frequencies (f) of actionable genotypes in a diverse population of over 43,000 patients from the Penn Medicine BioBank (PMBB) and demonstrated an urgent need for the implementation of pharmacogenetic testing in clinical care. Among our entire dataset of 43,000 patients, 100% of patients had a non-reference allele in at least one of the 15 studied pharmacogenes. To demonstrate the clinical implications of these findings, we did a deep dive into the gene-drug pair of CYP2C19 and Clopidogrel, the latter of which is an antiplatelet drug used to prevent recurrence of major adverse cardiovascular events (MACE). Our analysis revealed that loss of function (LOF) variants in CYP2C19 are more common in individuals of African (n = 11,156, f = 0.377), East Asian (n = 677, f = 0.612), and South Asian (n = 562, f = 0.553) ancestry than individuals of European ancestry (n = 30,008, f = 0.273). The trend is particularly pronounced in the South Asian and East Asian ancestry groups, where reduced function phenotype frequencies exceed twice those of the European ancestry group which sits at 27%. We analyzed clopidogrel usage and rates of subsequent MACE in PMBB electronic health record (EHR) data and found that patients with two LOF alleles in CYP2C19 have significantly higher rates of MACE than those with normal or increased function phenotypes (p = 0.013). Together, these findings point to the urgency of implementing pharmacogenetic testing in the clinic to both increase the overall standard of care and combat the medical inequality created by the current standard.

 
32 Youngju Kim Department of Biological Sciences, Sookmyung Women’s University General Poster only Landscaping prognostic gene expression and anticancer targets using sample- and lineage-consensus scores derived from pan-cancer omics data Youngju Kim, Jieun Lee, Euna Jeong, and Sukjoon Yoon Integration of large scale multi-omics data enables extensive analysis of cross-associated data pairs between collateral datasets such as mutation, RNA/protein expression, RNAi/drug screen, clinical information, etc. Here, we investigated the reproducibility of the significance in the association by measuring sample consensus and/or lineage consensus for given data pairs (gene expression vs. patient survival, gene expression vs. RNAi efficacy). The sample consensus score represents the number of significant (p<0.01) cases in the associations for a given data pair among varied sample groups within a lineage, while the lineage consensus score is the significant cases among diverse lineages.

~74% of prognostic gene expression (i.e., gene expression associated with patient survival) were exclusively found in a single lineage, while only ~1.8% prognostic genes showed the significance in over three different lineages (consensus score  >3). In case of sample consensus, common prognostic genes between genders were less than exclusive genes between them. However, prognostic genes with high sample consensus score generally exhibited high lineage consensus scores, implying that consensus scoring might improve the reproducibility of predicted prognostic gene expression in other cohort or cancer lineages. Furthermore, the lineage consensus score for the association between sgRNA (CRISPR) efficacy and RNA expression could predict the association between shRNA and RNA expression.

This study suggests that consensus scoring help smart data mining prioritizing true-positives for the further validation. The concept of consensus scoring has been implemented in Q-omics software (http://qomics.sookmyung.ac.kr) for analyzing associations between diverse cancer omics data.

https://f1000research.com/posters/10-1267
33 Lucien Krapp EPFL General Poster only Parameter-free atomistic geometric deep learning for accurate prediction of protein interfaces Lucien Krapp, Luciano Abriata, Fabio Cortes, Matteo Dal Peraro Predicting the interactions that a protein can establish with other molecules from its structure remains a major challenge. As shown by recent applications to tertiary structure prediction and opposite to current mainstream methods for interaction interface prediction, low-level, geometry-based, physicochemical-agnostic representations of structures have several advantages over methods that require pre-calculation of surfaces, charges, hydrophobicity, and other kinds of parameterizations. Here we introduce a new geometric transformation that acts directly on protein atoms labeled with nothing more than elemental names, that enable the prediction of interaction interfaces with other proteins, nucleic acids, and small molecules at high confidence without the need of any preliminary calculation nor parametrization of the system physics. A model trained to predict protein-protein interaction interfaces outperforms the state of the art at a very low computational cost. A generalized model trained to predict and distinguish other kinds of protein interfaces performs with high confidence, too. The low computational cost of this method (available for use online at https://pesto.epfl.ch) enables processing molecular dynamics trajectories within minutes, which we show allows the discovery of interfaces that remain inconspicuous in static X-ray structures.  
34 Nicholas Larson Mayo Clinic General Poster only Statistical Finemapping Analysis of Splice Quantitative Trait Loci Nicholas B. Larson, Shannon K. McDonnell Many protein-coding genes are alternatively transcribed as multiple distinct isoforms, each translated as a unique protein product.  For such genes, expression is intrinsically multivariate in nature, and differential isoform abundance is partly attributable to genetics via splicing quantitative trait loci (sQTLs).  Various disease traits have been associated with differential splicing, and sQTLs have been shown to be independently associated with disease genetics above and beyond eQTLs.  While most sQTL association methods use omnibus multivariate testing, the resulting statistics are not amenable to statistical finemapping methods that alternatively require Wald statistics which convey both magnitude and direction of effect. Consequently, little is currently known regarding the finemapping of sQTLs at the transcriptomic level, including number of unique causal sQTLs per gene.  An alternative strategy is to consider sQTL associations with each individual expression trait using standard linear regression methods and adopt a multi-trait finemapping analysis.  To this end, we explored summary statistics from sQTL analyses from GTEx v8, using prostate tissue data based on Leafcutter intron-excision ratio expression phenotypes as an example analysis.  Multivariate finemapping of relative expression values clustered by gene was performed using the Bayesian method fastPAINTOR for all genes exhibiting a significant overall sQTL association (G = 2401), and variant posterior inclusion probabilities (PIPs) of causal variants were calculated for all SNPs within a 20kb buffer of transcription start and end sites.  The mean number of expression traits per gene was 10.8 (SD = 7.6).  Among 529 genes that had at least 1 SNP with a PIP>0.5, over 50% corresponded to >2 SNPs with high PIPs.  The findings indicate that the sQTL genetics are complex, and future work will expand out to all available tissue types and compile trends of genetic architectures in their relation to relative isoform abundance.  
35 Jieun Lee Sookmyung Women's university General Poster only Drowning in omics data? : Q-omics, a smart software for assisting oncology and cancer research Jieun Lee, Youngju Kim, Euna Jeong, and Sukjoon Yoon Increased multi-level omics data have enabled data-driven studies on cancer drugs, targets and biomarkers. Thus, it is necessary to develop comprehensive tools for oncologists and cancer scientists to carry out extensive data mining without computational expertise. For this purpose, we have developed innovative software that enables user-driven analyses on cancer omics data, assisted by knowledge-based smart systems.



Publicly available multi-level omics data of mutations, gene/protein expression, patient survival, immune score (tumor infiltrating cells), drug screening and RNAi (shRNA and CRISPR) screenings on patient samples and cell lines, were integrated from the TCGA, GDSC, CCLE, NCI and DepMap databases. Q-omics provides user-friendly interface for calculation and visualization of cross-associated data pairs retrieved from integrated datasets. The optimal selection of samples, datasets and/or other filtering options is guided by knowledge-bases of the software for the quick and easy finding of significant associations between data pairs. Furthermore, implemented smart algorithms prioritize significant hits based on consensus scoring methods. Consensus scoring using multiple statistical tests with varied sample (or lineage) selection, enriches noise-free, robust cross-associated pairs in the hit list.



We believe that Q-omics provide simple but powerful tools for all areas of oncology and cancer research. The latest version of Q-omics software is available at http://qomics.sookmyung.ac.kr.









https://f1000research.com/posters/10-1268
36 Joshua Welch University of Michigan General Poster only Single-cell multi-omic velocity infers dynamic and decoupled gene regulation Chen Li, Maria Virgilio, Kathy Collins, Joshua Welch Single-cell multi-omic datasets, in which multiple molecular modalities are profiled within the same cell, provide a unique opportunity to discover the interplay between cellular epigenomic and transcriptomic changes. To realize this potential, we developed MultiVelo, a mechanistic model of gene expression that extends the popular RNA velocity framework by incorporating epigenomic data. MultiVelo uses a probabilistic latent variable model to estimate the switch time and rate parameters of gene regulation, providing a quantitative summary of the temporal relationship between epigenomic and transcriptomic changes. Fitting MultiVelo on single-cell multi-omic datasets from brain, skin, and blood cells revealed two distinct mechanisms of regulation by chromatin accessibility, quantified the degree of concordance or discordance between transcriptomic and epigenomic states within each cell, and inferred the lengths of time lags between transcriptomic and epigenomic changes.  
37 Yunxian Liu Cedars-Sinai Medical Center, Los Angeles, California General Poster only Paradoxical Sex-Specific Patterns of Autoantibody Response to SARS-CoV-2 Infection Yunxian Liu, PhD, MS, Joseph E. Ebinger MD, MS, Sandy Joung, MHDS, Min Wu, MS, Petra Budde, PhD, Jana Gajewski, MSc, Brian Walker, PhD, Rowann Mostafa, BS, Manuel Bräutigam, MSc, Franziska Hesping, BSc, Elena Schäfer, MSc, Ann-Sophie Schubert, MSc, Hans-Dieter Zucht, PhD, Jonathan Braun, Gil Y. Melmed, MD, MS, Kimia Sobhani, PhD, Moshe Arditi, Jennifer E. Van Eyk, PhD, Susan Cheng, MD, MPH* Justyna Fert-Bober, PhD* Background. Pronounced sex differences in the susceptibility and response to SARS-CoV-2 infection remain poorly understood. Emerging evidence has highlighted the potential importance of autoimmune activation in modulating the acute response and recovery trajectories following SARS-CoV-2 exposure. Given that immune-inflammatory activity can be sex-biased in the setting of severe COVID-19 illness, the aim of the study was to examine sex-specific autoimmune reactivity to SARS-CoV-2 in the absence of extreme clinical disease.

Methods. In this study we assessed autoantibody (AAB) reactivity to 91 autoantigens previously linked to a range of classic autoimmune diseases in a cohort of 177 participants (65% women, 35% men, mean age of 35) with confirmed evidence of prior SARS-CoV-2 infection based on presence of antibody to the nucleocapsid protein of SARS-CoV-2. Data were compared to 53 pre-pandemic healthy controls (49% women, 51% men). For each participant, socio-demographic data, serological analyses, SARS-CoV-2 infection status and COVID-19 related symptoms were collected by electronic survey of questions

Results. In multivariable analyses, we observed sex-specific patterns of autoreactivity associated with the presence or absence as well as timing and clustering of symptoms associated with prior COVID-19 illness. Notably, whereas the overall AAB response was more prominent in women following asymptomatic infection, the breadth and extent of AAB reactivity was more prominent in men following at least mildly symptomatic infection. Notably, the observed reactivity included distinct antigens with molecular homology with SARS-CoV-2.

Conclusion. Our results reveal that prior SARS-CoV-2 infection, even in the absence of severe clinical disease, can lead to a broad AAB response that exhibits sex-specific patterns of prevalence and antigen selectivity. Further understanding of the nature of triggered AAB activation among men and women exposed to SARS-CoV-2 will be essential for developing effective interventions against immune-mediated sequelae of COVID-19.

https://doi.org/10.7490/f1000research.1118872.1
38 Colton McNinch Mayo Clinic General Poster only REAL-neoJunc, a Novel Bioinformatics Pipeline for Identification and Prioritization of Neoantigens from Aberrant RNA Splicing Isoforms Colton McNinch, Erik Jessen, Daniel Wickland, Barath Shreeder, Brian Necela, Keith Knutson, and Yan Asmann Cancer neoantigen vaccines are among the most promising next generation immunotherapy agents. Neoantigens arise from protein-altering somatic mutations. Multiple bioinformatics frameworks, including our own REAL-neo {Ren et al.} pipeline, have been developed to identify neoantigens from single nucleotide mutations, small INDELs, and gene fusions. However, the identification of neoantigens from aberrant RNA splicing events, which is another rich source of neoantigens, have been analytically difficult.



Accurate identification and prioritization of splicing neoantigens from short read RNA-seq data has been challenging due to the absence of full-length transcriptome profiles. Here, we present a computational pipeline, REAL-neoJunc, that leverages somatic splice site mutations detected in tumor exomes to predict splicing acceptor and donor gain and loss, as well as the consequence aberrant splicing isoforms. In addition, REAL-neoJunc discovers de novo splicing isoforms without underlying DNA splice site mutations by identifying novel junction supporting reads. The putative aberrant splicing isoforms are quantified by Salmon {Patro et al.} and only the expressed isoforms are translated in silico to obtain putative neoepitopes. The binding affinities between neoepitopes and patient-specific HLAs (class-I or class-II) are predicted using multiple algorithms {Hundal et al.}.



We applied REAL-neoJunc to multiple TCGA data sets. Aberrant splicing neoantigens substantially increased total neoantigen load in all cancer types examined, and were more prevalent in cancers driven by small mutations (e.g. lung and liver cancers) compared to those driven by genomic rearrangements and copy number alternations (e.g. breast cancer).



We selected 15 of the top splicing neoantigens that are predicted to bind to HLA-A*02:01 for validation using in vitro T2 binding assays. All 15 neoantigens were validated with strong binding affinities and 7 (46.7%) epitopes bound stronger than the positive control which is a peptide from flu virus.

 
39 Jasmine Olvany Case Western Reserve University  General Poster only Detecting and assessing the epidemiology of asymptomatic malaria in regions of sub-Saharan Africa. Jasmine M. Olvany, Sarah A. Tishkoff, Peter A. Zimmerman, Scott M. Williams The highest malaria burden in the world is in Africa, despite a massive push for elimination. One hypothesis about why elimination efforts have not been as effective as expected is the existence of a parasitic reservoir in asymptomatic individuals. In this study, we propose to identify and characterize this asymptomatic population using a method to detect Plasmodium infections retrospectively from human whole genome sequence data. Because the four species that cause malaria, and hence are of interest to us (P. falciparum, P. vivax, P.malariae, and P.ovale), are genetically similar to each other, we identified species informative regions in both the nuclear and mitochondrial genomes to target species specificity. Identification of these regions required careful filtering to remove all identical sequences in the four species reference genomes at a 150 bp resolution. Any sequence left over after the filtering process was considered unique to a single species and used to build an index for alignment using bowtie2. We tested the viability of both the mitochondrial and nuclear genomes in detecting Plasmodium infection in human WGS unmapped reads. We found that the mitochondrial genome was useful in detecting the presence of any Plasmodium genus infection. However, the nuclear genomes have more genetic variation, and thus, identified more species-specific sequences. After the new methodology is further refined, we will be able to detect infections in the TOPMed Africa6K dataset, which represents six different countries in sub-Saharan Africa. Once the infection status is determined, we plan to study the infection composition, regional differences, month-by-month variation, and level of infection in asymptomatic individuals in comparison to known symptomatic trends and prevalence.  
40 So Young Ryu University of Nevada Reno General Poster only Evaluating missing data imputation methods for mass spectrometry data Sijia Qiu, So Young Ryu Mass spectrometry can provide valuable information to quantify proteins and measure protein post translational modification site stoichiometry in biological mixtures. However, the absence of missing protein/peptide abundances and the co-existence of various missing mechanisms in mass spectrometry data make it difficult to properly analyze mass spectrometry data. Previously, several missing data imputation approaches for mass spectrometry data have been developed. However, an evaluation approach that properly evaluates missing data imputation methods in various aspects is still lacking. The development of such an evaluation approach would be beneficial for researchers to determine the best missing data imputation approaches for their data.

Here, we provide an approach that evaluates currently available missing data imputation approaches in several aspects. The proposed approach utilizes many real samples with known relative ratios and simulated datasets. The real samples incorporated in this evaluation include, but not limited to 1) protein mixtures with known mixture ratios of samples from different species; 2) spiked-in proteins in complex mixtures; and 3) standard mixtures. The simulations consider various missing mechanisms: NMAR (Not Missing At Random), MAR (Missing At Random), and mixtures of NMAR and MAR. The proposed approach provides several metrics (e.g., a root mean squared bias, variance estimation) that measure the performance of missing data imputation approaches.

This study will be beneficial to both bioinformaticians and biomedical researchers. Biomedical researchers can choose appropriate imputation approaches for their experiments based on the guideline provided by this study. The automation of the proposed evaluation approach can help bioinformaticians compare their proposed methods to currently available missing data imputation methods (e.g., quantile regression imputation of left-censored data, k-nearest neighbors, random forest, and minimum or a half minimum approach) in various scenarios effortlessly.
 
41 Sharifa Sahai Harvard Medical School General Poster only Multimodal AI for Renal Allograft Biopsy

Sharifa Sahai, Richard J. Chen, Jana Lipova, Kuan Chen, Tiffany Chen, Judy J. Wang, Drew Williamson, Ming Y. Lu, Astrid Weins, Faisal Mahmood
According to the global observatory of transplantation and donation there are over 100,000 kidney transplants annually with over 24,000 transplants in the United States alone. The standard of care for the assessment of pre-transplant kidneys and post-transplant rejection is the manual assessment of renal biopsies. However, such manual assessment suffers from large inter- and intra-observer variability (κ=0.22). Such variability can have dire consequences ranging from under and over treatment to partial or full transplant rejection or even death. Moreover, renal allograft assessment is a complicated process involving multiple tissue stains and several modalities and requires the expertise of renal pathologists. Such expertise is often not available in low resource settings which can result in delays in diagnosis and treatment. Here, we propose MANTA (Mulitmodal AI for Renal Transplant Assessment) an objective and automated method for assessment of renal allograft biopsies for screening of renal allograft rejection. MANTA utilizes weakly supervised deep learning mulitmodal fusion using gigapixel whole slide images and patients’ diagnosis as labels. MANTA does not require pixel, patch or ROI-level labels for training. MANTA fuses morphological features from H&E, PAS, Masson Trichrome and Jones Silver stains to get holistic predictive results.  We demonstrate that MANTA achieves an AUC of 0.95 for assessing Interstitial Fibrosis and Tubular Atrophy, 0.82 for T Cell mediated rejection and 0.81 for Antibody mediated rejection. We have gathered a large cohort of renal allograft biopsies from the USA (n=11,974 WSI, N=909 patients) and an independent test cohort from Turkey, (n=1,025, N=231) in order to assess clinical performance across patient populations, sample preparation protocols and slide scanning instrumentation. The evaluation of this AI system paves the way for clinical trials to establish the efficacy of AI-assisted renal allograft assessment to improve kidney transplant outcomes. https://drive.google.com/file/d/1UhWoqUKu3Tspr1wKLERr3djS-DmAD2Pf/view?usp=sharing
42 Katrin Sangkuhl Stanford General Poster only Pharmacogenomics Clinical Annotation Tool (PharmCAT): Current State and Future Directions  Katrin Sangkuhl, Michelle Whirl-Carrillo, Binglan Li, Ryan Whaley, Mark Woon, Karl Keat, Scott Dudek, Anurag Verma, Shefali Verma, Sony Tuteja, Marylyn D. Ritchie, Teri E. Klein Implementation of pharmacogenomics (PGx) into the clinical workflow is critical for achieving the goal of precision medicine, which can be facilitated by automatic methods to translate genetic test results into clinical actions such as prescribing recommendations. The Pharmacogenomics Clinical Annotation Tool (PharmCAT, https://pharmcat.org) was developed to extract PGx variants from a VCF file derived from sequencing or genotyping technologies, infer genotypes and corresponding phenotypes, and connect those with clinical or prescribing recommendations.

PharmCAT v1.2.1 packages pharmacogene allele definitions, function assignments, phenotype mappings, and prescribing recommendations pulled from the Clinical Pharmacogenetics Implementation Consortium (CPIC) database. PharmCAT has a modular design. This provides the ability to run PharmCAT completely from VCF input to the report output but also supports the input of external genotype or phenotype data and enables intermediate output at each step. The Named Allele Matcher and Phenotyper module predict genotypes from a user VCF file using the CPIC allele definitions and map those to the corresponding phenotypes, respectively. The Reporter connects phenotypes to CPIC prescribing recommendations and combined with gene information and disclaimers/warnings a HTML file is generated that can be converted to a PDF report designed for readability by clinicians and patients. PharmCAT provides a pre-processing tool to assist in formatting a user’s VCF file to comply with input requirements by normalization of variant representation, conversion to the expected multi-allelic format and separating multi-sample files into single sample files.

UK BioBank and Penn Medicine BioBank data will be used to further test the newest release and to create a genotype-phenotype output format for Biobank data analysis to determine population frequencies. We will extend the documentation to support the users to run PharmCAT.  Another future focus is on a FHIR compatible report output for incorporation into electronic health record systems.

This work is supported by the NIH grant U24HG010862. 
 
43 Thomas LaFramboise Case Western Reserve University General Poster only Systematic Interrogation of Relationships between the Mitochondrial Genome and Therapeutic Response in Cancer Maryssa Shanteau-Jackson, Tatiana Dombrovski, VedantThorat, Thomas LaFramboise Tandem advances in high-throughput DNA/RNA sequencing and high-throughput drug screening have facilitated the identification of genome characteristics that render tumors sensitive to specific classes of anti-cancer drugs. These “precision medicine” efforts have almost exclusively focused on the nuclear compartment of the human genome, ignoring the ~16.5 kb mitochondrial chromosome and the 13 mRNAs, 22 tRNAs, and 2 rRNAs encoded by the mitochondrial genome (mtDNA). This is despite the fact that mitochondrial metabolism is important for tumor growth.



Here we propose a strategy to test the hypothesis that certain mitochondrial genome characteristics confer sensitivity to some classes of cancer therapeutics. We present preliminary in silico work analyzing drug sensitivity data from >5,000 compounds tested in some 2,400 cancer cell lines, and assessing relationships between these sensitivity measures and mtDNA/mtRNA characteristics. 
https://doi.org/10.7490/f1000research.1118882.1
44 Adam Sieradzan University of Gdańsk General Poster only UNICORN Adam Sieradzan

UNRES is a well established force field for protein simulation. It has been constantly developed for over 30 years. UNRES is a highly reduced physics-based model, which has been very successful in physics-based prediction of protein structure, studying the mechanisms and kinetics of

protein folding, as well as in investigating biological processes. In 2015 the UNRES model was adapted to nucleic-acid resulting in creation of NARES-2P force field. Recently, SUGRES-1P force field for sugars was developed. However, the interactions between cellular components were missing. In 2018 interactions between nucleic-acids and proteins were introduced. This started a new force field - UNIted COarse gRaiNed (UNICORN) force field for all the most important cellular components: proteins, nucleic acids, sugars, lipid and ions. In the poster the UNICORN model will be discussed, the current state of the UNICORN development will be shown with description of current biological applications and limits.

This work was supported by NCN OPUS 2017/27/B/ST4/00926.
 
45 Pankhuri Singhal University of Pennsylvania General Poster only Clinical event prediction in complex genetic traits leveraging longitudinal EHR and genomic data in Penn Medicine BioBank Pankhuri Singhal, Anurag Verma, Dokyoon Kim, Marylyn D. Ritchie      Clinical event prediction remains a challenge due to variation across patient health trajectories. While approaches such as clustering aim to classify patient-similarity based on predefined health variables, few studies consider the correlation architecture across multivariate longitudinal electronic health record (EHR) data. By understanding the complex relationships between lab measurements, health and procedural codes, and prescriptions over time in a health system, in addition to genetic data, we can begin to detect patterns for predicting clinical events for a given patient. Such health forecasting aims to supplement the current “one-size-fits-all” approach to diagnostics that results in reactive care with too small an intervention window available.

     In this study, we leverage Penn Medicine BioBank’s 45,000 exome-sequenced individuals to ask questions about health patterns at the individual and health system level. We propose a framework to integrate each type of health data in a knowledge-graph in order to extract intrinsic relationships using a correlation engine. As part of this analysis, we present preliminary findings exploring the disease comorbidity architecture in a cross-section of the longitudinal data. Using the Ising model, a type of Markov Random Field undirected probabilistic graphical model, we constructed a disease-disease and disease-gene map of co-occurrence using ICD codes and gene-burden scores derived from putative loss-of-function variants. Our results reflect clusters of both known connections between disease, as well as novel pairwise relationships between genes and diseases. Our findings shed light on the effect of comorbid conditions on the trajectory of adverse outcomes.

     In the next steps of this research, we will conduct time-series analysis on the significant interactions identified through the knowledge-graph. Using co-occurrence of health variables will allow for identification of trends in disease trajectories. We hope to infer prognostic predictions about subsequent clinical events in patients and the most effective lines of treatment for disease prevention.

 
46 Paul Stewart Moffitt Cancer Center General Poster only iModMix, a network-based framework and R package for multi-omics analysis Paul A. Stewart, Noah Sulman, Christopher M. Wilson, Hayley Ackerman, Elsa R. Flores, Qian Li, Ann Chen, Brooke L. Fridley Combining multiple omics technologies provides a more detailed snapshot of biological processes, but these “multi-omics” data are traditionally analyzed individually. Although individual analysis can give some biological insight, it misses the opportunity to quantitatively integrate data and harness their power simultaneously. Some progress has been made in developing tools for integrating multi-omics data, but these are primarily for genomics applications. Tools that do exist for integrating metabolomics data (e.g., IMPALA, MSEA) are enrichment-focused and feature interactions are ignored. Given the shortcomings of existing tools, we developed iModMix (integrative Modules for Multi-omics data), a framework and R package for analyzing multi-omic datasets. iModMix uses graphical lasso to build biologically meaningful networks from expression features (e.g., metabolites, proteins). This solves the “hairball problem” by reducing the dimensionality of the network. Topological overlap measures (TOMs) are then generated for the reduced network, with features having high topological overlap if they are connected to similar neighbors. We then apply fuzzy correlation clustering for identifying groups of related features (“modules”). This fuzzy approach allows features to belong to more than one module, as for example proteins can belong to multiple pathways. These modules are empirically derived from the data and are not constrained by pre-defined pathways (e.g., KEGG). Importantly, modules from multi-omics datasets can be related to a phenotype and also between omic data types (e.g., metabolomics to proteomics). To test this approach, we generated a novel proteometabolomics dataset of KRAS mutant lung tumors from genetically altered mice (n = 20). Consistent with known biology, iModMix identifies cysteine metabolism protein modules highly correlated with glutathione metabolite modules. Interestingly, we found protein modules highly correlated with metabolite modules consisting entirely of unidentified metabolites. Ongoing work is investigating iModMix’s ability to identify unknown metabolites based on associations with known modules.

https://f1000research.com/posters/10-1247
47 Alexej Abyzov Mayo Clinic General Poster only CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing. Milovan Suvakov, Arijit Panda, Colin Diesh, Ian Holmes, Alexej Abyzov Detecting copy number variations (CNVs) and copy number alterations (CNAs) based on whole-genome sequencing data is important for personalized genomics and treatment. CNVnator is one of the most popular tools for CNV/CNA discovery and analysis based on read depth. Herein, we present an extension of CNVnator developed in Python-CNVpytor. CNVpytor inherits the reimplemented core engine of its predecessor and extends visualization, modularization, performance, and functionality. Additionally, CNVpytor uses B-allele frequency likelihood information from single-nucleotide polymorphisms and small indels data as additional evidence for CNVs/CNAs and as primary information for copy number-neutral losses of heterozygosity. CNVpytor is significantly faster than CNVnator-particularly for parsing alignment files (2-20 times faster)-and has (20-50 times) smaller intermediate files. CNV calls can be filtered using several criteria, annotated, and merged over multiple samples. Modular architecture allows it to be used in shared and cloud environments such as Google Colab and Jupyter notebook. Data can be exported into JBrowse, while a lightweight plugin version of CNVpytor for JBrowse enables nearly instant and GUI-assisted analysis of CNVs by any user. CNVpytor release and the source code are available on GitHub at https://github.com/abyzovlab/CNVpytor under the MIT license.  
48 Marek Svoboda Geisel School of Medicine at Dartmouth General Poster only Internal oligo(dT) priming in bulk and single cell RNA sequencing Marek Svoboda, Hildreth R. Frost, Giovanni Bosco Significant advances in RNA sequencing have been recently made possible by the use of oligo(dT) primers for simultaneous mRNA enrichment and reverse transcription priming. The associated increase in efficiency has enabled more economical bulk RNA sequencing methods as well as the advent of high throughput single cell RNA sequencing, now already one of the most widely adopted new methods in the study of transcriptomics. However, the effects of off-target oligo(dT) priming on gene expression quantification have not been fully appreciated. In the present study, we describe the extent, the possible causes, and the consequences of internal oligo(dT) priming across multiple publicly available datasets obtained from a variety of bulk and single cell RNA sequencing platforms. In order to explore and address this issue, we developed a computational algorithm for identification of sequencing read alignments that likely resulted from internal oligo(dT) priming and their subsequent removal from the data. Directly comparing filtered datasets to those obtained by an alternative method reveals significant improvements in gene expression measurement. Finally, we infer a list of genes whose expression quantification is most likely to be affected by internal oligo(dT) priming. https://drive.google.com/file/d/1mFcNpgDbKOKds-3vGDa7mAS7xSFq6X12/view?usp=sharing
49 Michelle Whirl-Carrillo Stanford University General Poster only PharmGKB: Pharmacogene Allele Coordination Across PGx Resources  Michelle Whirl-Carrillo, Katrin Sangkuhl, Ryan Whaley, Mark Woon and Teri E. Klein The Pharmacogenomics Knowledgebase (PharmGKB; https://www.pharmgkb.org) is the largest publicly available resource for pharmacogenomics (PGx) discovery and implementation.  Its mission is to curate, annotate, integrate and disseminate knowledge regarding the influence of genetic variation on drug response. PharmGKB scientists manually curate the primary literature to capture details of published pharmacogenomic studies, including genetic alleles associated with drug phenotypes. 

Alleles of many pharmacogenes are referred to using the “star” nomenclature system.  Haplotypes across the gene sequence are assigned names that begin with the HGNC gene symbol, followed by a “star” symbol and then a number, for example CYP2D6*2. The Pharmacogene Variation Consortium (PharmVar) catalogs sequence variation and uses a strict set of rules to assign star allele names.  The allele definitions in PharmVar are the standards used in published literature, PGx clinical test reports and other PGx resources including PharmGKB, the Clinical Pharmacogenetic Implementation Consortium (CPIC) and the Pharmacogenomics Clinical Annotation Tool (PharmCAT).  As new alleles are cataloged and named by PharmVar, these resources need to stay in sync to maintain consistency across literature annotations, PGx prescribing guidelines and patient genomes annotated with PGx knowledge.

PharmGKB has developed a process to coordinate allele updates across PharmVar, PharmGKB, CPIC and PharmCAT.  When a new version of the PharmVar database is released, PharmGKB uses the PharmVar API to pull the updated information.  PharmGKB scientists review the new alleles and/or changes to existing allele definitions, reconcile any issues and then submit the updates to the PharmGKB DB.  When the new haplotype submission is processed, PharmGKB web pages are updated, a new pharmacogene allele definition download file is posted to the PharmGKB and CPIC websites, and the updates are submitted to the CPIC DB.  Additionally, the process triggers an allele definition update in the PharmCAT repository.  New releases of the CPIC DB and PharmCAT software follow.
 
50 Brenda Xiao University of Pennsylvania General Poster only Evaluation of cardiometabolic polygenic risk scores and mendelian randomization study demonstrates putative causal relationships across women’s health conditions Brenda Xiao, Digna R. Velez Edwards, Anastasia Lucas, Theodore Drivas, Kathryn Gray, Brendan Keating, Chunhua Weng, Gail P. Jarvik, Hakon Hakonarson, Leah Kottyan, Noemie Elhadad, Regeneron Genetics Center, Wei-Qi Wei, Yuan Luo, Dokyoon Kim, Marylyn Ritchie, Shefali Setia Verma Cardiometabolic diseases are highly comorbid and associated with poor health outcomes. However, the investigation of the relationship between the genetic burden that predisposes to cardiometabolic diseases with the risk of women’s health conditions such as breast cancer, endometriosis and many pregnancy-related complications is highly understudied. We calculated Polygenic Risk Scores (PRS) for women in two datasets: the Penn Medicine BioBank (PMBB; 21,837 samples) and the electronic MEdical Records and GEnomics (eMERGE; 49,171 samples) network for four cardiometabolic phenotypes (body mass index (BMI), coronary artery disease (CAD), type 2 diabetes (T2D) and hypertension (through blood pressure measurements). We then tested the association of each cardiometabolic PRS with 23 obstetrics and gynecological conditions in participants stratified by European and African ancestry and we investigated trends of disease prevalence in high and low PRS groups by age. Our analysis identified 14 significant associations in both cohorts reflecting potential shared biology among common cardiometabolic disease and women’s health conditions. The BMI PRS was associated with endometrial cancer, gestational diabetes, and polycystic ovarian syndrome (PCOS). Other significant associations include the CAD PRS with breast cancer and the T2D PRS with gestational diabetes and PCOS. Mendelian randomization on significant associations showed causal effects between cardiometabolic diseases and women’s health conditions. The most significant associations were between T2D and gestational diabetes (p=2.6X10-44; beta=0.61) and BMI and endometrial cancer (p=5.3x10-14; beta=0.58). Individuals with high PRS were also more likely to develop conditions such as PCOS and gestational hypertension at earlier ages. Our analyses show significant differences in the strength of association across ancestries and recognizing ancestral differences among this dual burden within clinical care could help improve health outcomes in women living with comorbid conditions.  
51 Zhongming Zhao University of Texas Health Science Center at Houston General Poster only Integrative genetic, functional genomic, and single-cell transcriptomic analyses identified CXCR6 as a marker gene in COVID-19 severity Yulin Dai, Junke Wang, Hyun-Hwan Jeong, Wenhao Chen, Peilin Jia, Zhongming Zhao

The coronavirus disease 2019 (COVID-19) is an infectious disease that mainly affects the host respiratory system with ~80% asymptomatic or mild cases and ~5% severe cases. Recent genome-wide association studies (GWAS) have identified several genetic loci associated with severe COVID-19 symptoms, such as 3p21.31 locus (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1). We implemented integrative approaches, including transcriptome-wide association studies (TWAS), colocalization, and functional element prediction analyses, to interpret the genetic risks in lung, whole blood, and immune cells using two independent GWAS datasets from Host Genetics Initiative round 4 A2 and Severe COVID-19 GWAS Group. To understand the context-specific molecular alteration, we further performed deep learning-based single cell transcriptomic analyses on a bronchoalveolar lavage fluid (BALF) dataset from moderate and severe COVID-19 patients. In TWAS, colocalization, and functional analysis, we discovered CXCR6 has a protective effect on lung and a risk effect on whole blood, respectively. In lung resident memory CD8+ T (TRM) cells, we found a 2.24-fold decrease of cell proportion and lower expression of CXCR6 (fold change = 0.56, two-sided Wilcoxon p = 1.8 × 10-18) in severe patients than moderate patients. Pro-inflammatory transcriptional programs, apoptosis, and hypoxia pathways were highlighted in TRM cells transition from moderate to severe groups. We illustrated one potential mechanism of host genetic variants or other unknown risks that might impact the severity of COVID-19 through altering the expression of CXCR6 and lung TRM cell proportion and stability, therefore, impairing the first-line defense in lung. https://doi.org/10.7490/f1000research.1118891.1
52 Kevin Jacobs Deepcell Bio Workshop: Image-based profiling: a powerful and challenging new data type Poster only A deep learning cell classification and sorting platform for single cell high-dimensional morphology characterization Kevin B Jacobs, Andreja Jovic, Kiran Saini, Anastasia Mavropoulos,

Ryan Chow, Simo Zhang, Esther Lee, Michael Phelan, Jeanette Mei, Janifer Cruz, Chassidy Johnson, Nianzhen Li, Thomas J. Musci, Mahyar Salek, Maddison (Mahdokht) Masaeli
Although cell morphology is often the clinical gold standard for diagnosis and prognosis for many diseases and conditions, cell morphology information relies on low throughput and semi-quantitative approaches, and has seen limited application in combination with comprehensive molecular and functional characterization methods.  This dissonance is due to the largely manual process of collecting morphology information and limitations of cell sorting methods which can damage or perturb cells.  We present a platform to address these limitations based on a novel microfluidic optical device capable of high-throughput cell imaging and sorting.  The hardware is complemented by a machine learning infrastructure capable of real time analysis on cell images to generate high-dimensional morphologic descriptors and classifications, a machine learning assisted human image annotation tool to facilitate model training, a comprehensive atlas of pre-labeled cell images, and a library of pre-trained models for specific biological applications.  Our platform yields populations of cells that are label-free, viable, and minimally functionally perturbed, allowing sorted cells to be recovered and further characterized by molecular and functional assays.  Additionally, cell images can be used to generate high-dimensional morphological profiles to reveal and isolate previously unrecognized heterogeneous cell populations.  To demonstrate the platform’s capabilities, we imaged cells and trained a neural network classifier to identify and enrich malignant cells from non-small cell lung cancer dissociated tumor cell samples.  We verified enrichment of malignant cells by performing RNA and DNA analysis on the enriched samples. High-dimensional morphology analysis demonstrated the presence of morphological heterogeneity within the malignant cell population. Our platform is able to sort cells to enrich for these morphology-based subpopulations, which can be further characterized to link high-dimensional morphology to molecular profiles for deeper understanding of disease complexity.  
53 Hannah Spitzer Institute of Computational Biology, Helmholtz Center Munich Workshop: Image-based profiling: a powerful and challenging new data type Poster only Unbiased analysis of highly multiplexed image data reveals links between transcription and subnuclear organisation Hannah Spitzer, Scott Berry, Lucas Pelkmans, Fabian J Theis Highly multiplexed microscopic imaging holds enormous promise for understanding biological systems. Yet, obtaining insights from spatially resolved high-dimensional datasets in an unbiased manner remains challenging, as perturbations or cell-cycle stages confound the molecular profiles. Here, we introduce a framework based on conditional variational autoencoders (cVAE) for the automated assignment of pixels to invariant cellular landmarks. This allows unbiased identification and quantitative comparison of size, shape, molecular composition, and spatial organisation of subcellular structures across perturbations, cell cycles and other cell states. When applied to high-resolution 4i data, this approach reveals how subnuclear organisation changes upon transcriptional perturbations. By integrating information across the multicellular, cellular and subcellular scales, we uncover intimate links between membrane-less organelles and bulk transcriptional output of single cells. https://hmgubox2.helmholtz-muenchen.de/index.php/s/LGyJFAWMfjtDxFN
54 Luke Ternes Oregon Health and Science University Workshop: Image-based profiling: a powerful and challenging new data type Poster only Extracting more biologically relevant features from multiplexed imaging with a Multi-Encoder Variational AutoEncoder (ME-VAE) Luke Ternes, Mark Dane, Marilyne Labrie, Gordon Mills, Joe Gray, Laura Heiser, Young Hwan Chang Emerging high-dimensional multiplexed imaging techniques enable measurements of expression and subcellular spatial distribution for tens of markers in single cells and facilitate our understanding of heterogeneous cell phenotypes. Encoding single cell representations as quantitative measurements is an essential part of image-based cell phenotyping; however, due to the lack of robust methods for extracting relevant features, defining suitable representations to capture the complex high-dimensional multiplexed imaging features remains a challenge. Variational Autoencoders (VAEs) can create relevant descriptors of images by encoding the features into a compressed latent space and outperform handcrafted features for differentiating data. Despite the VAE’s success on related tasks such as tissue feature extraction, VAEs struggle at identifying biologically informative features in single cell imaging data due to the amount of uninformative technical variation, which the architectures are hypersensitive to. In order to extract biologically meaningful high-dimensional representations, we propose a multi-encoder VAE (ME-VAE) for single cell multiplexed image analysis, using transformed image pairs along parallel encoders as a self-supervised signal to extract transform-invariant features. Using parallel encoders and a shared latent space encourages mutual information between transformations and discourages unshared transformation specific information. We demonstrate that the ME-VAE improves the clusterability of known cell populations from different ligand-treated populations, removes undesired features from the latent space, and better extracts novel biologically relevant features. Our ME-VAE outperforms traditional image features, standard VAEs, as well as the current state-of-the-art approaches such as 𝛽-VAEs and Invariant C-VAEs, which were designed to make interpretable latent spaces with less noisy features. We also show how the ME-VAE can be used to interpret key cellular differences and guide the development of new feature representations, which result in improved performance compared to traditional imaging features. Finally, we apply ME-VAE to other multiplexed imaging modalities and demonstrate generalizability of the proposed approach.  https://doi.org/10.7490/f1000research.1118845.1
55 Gregory Way University of Colorado Anschutz Workshop: Image-based profiling: a powerful and challenging new data type Poster only Morphology and gene expression profiling provide complementary information for mapping cell state Gregory P. Way, Ted Natoli, Adeniyi Adeboye, Lev Litichevskiy, Andrew Yang, Xiaodong Lu, Juan C. Caicedo, Beth A. Cimini, Kyle Karhohs, David J. Logan, Mohammad Rohban, Maria Kost-Alimova, Kate Hartland, Michael Bornholdt, Niranj Chandrasekaran, Marzieh Haghighi, Shantanu Singh, Aravind Subramanian, Anne E. Carpenter Deep profiling of cell states can provide a broad picture of biological changes that occur in disease, mutation, or in response to drug or chemical treatments. Morphological and gene expression profiling, for example, can cost-effectively capture thousands of features in thousands of samples across perturbations, but it is unclear to what extent the two modalities capture overlapping versus complementary mechanistic information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb A549 lung cancer cells with 1,327 small molecules from the Drug Repurposing Hub across six doses. We determine that the two assays capture some shared and some complementary information in mapping cell state. We find that as compared to L1000, Cell Painting captures a higher proportion of reproducible compounds and has more diverse samples, but measures fewer distinct groups of features. In an unsupervised analysis, Cell Painting grouped more compound mechanisms of action (MOA) whereas in a supervised deep learning analysis, L1000 predicted more MOAs. In general, the two assays together provide a complementary view of drug mechanisms for follow up analyses. Our analyses answer fundamental biological questions comparing the two biological modalities and, given the numerous applications of profiling in biology, provide guidance for planning experiments that profile cells for detecting distinct cell types, disease phenotypes, and response to chemical or genetic perturbations. https://drive.google.com/file/d/1haMT9X5PlhQbwa7jHDDBfDzjn38LbWze/view?usp=sharing
56 Junhao WEN University of Pennsylvania Workshop: Image-based profiling: a powerful and challenging new data type Poster only Genetic heterogeneity of MCI/AD and its manifestation in the general population Junhao Wen, Zhijian Yang, Ahmed Abdulkadir, Guray Erus, Elizabeth Mamourian, Yuhan Cui,

Yong Fan, Andrew J. Saykin, Marylyn D. Ritchie, Li Shen, David A. Wolk, Haochang Shou, Ilya

Nasrallah, and Christos Davatzikos
State-of-the-art data-driven machine learning methods have dissected neuroanatomical heterogeneity in Alzheimer's disease (AD) and its preclinical stage of mild cognitive impairment (MCI) (Vogel et al., 2021; Wen et al., 2021; Yang et al., 2021; Young et al., 2018; Zhang et al., 2016). However, the extent to which this neuroanatomical heterogeneity is attributable to underlying genetic heterogeneity remains unclear. We previously proposed a novel deep learning semi-supervised clustering method, termed Smile-GAN. Applying Smile-GAN to ADNI derived four subtypes in AD and MCI: P1 individuals with normal brain anatomy, P2 individuals with medial temporal-spared mild diffuse atrophy, P3 individuals with focal medial temporal lobe atrophy, and P4 individuals with advancedatrophy (Yang et al., 2021). The current study sought to elucidate the neuroanatomical manifestation and genetic profiles of the four MCI/AD signatures in the general population. https://f1000research.com/posters/10-1240
57 Steven Brenner University of California, Berkeley Workshop: Social, Technical, and Ethical Challenges in Biomedical Data Privacy Poster only Biological discovery and consumer genomics databases activate latent privacy risk in functional genomics dat Zhiqiang Hu, Steven E. Brenner The privacy risks from individuals’ genomes have garnered increasing attention. Recent research studies and forensics have underscored the ability to re-identify a person using genomic-identified relatives and quasi-identifiers, such as sex, birthdate and zip code. However, summary omics data, such as gene expression values and DNA methylation sites, are generally treated as safe to share, with low privacy risks – though research studies have indicated they could be linked to existing genomes. We have demonstrated that some types of summary omics data can be accurately linked to a unique genome. We developed methods to match against genotypes in consumer genealogy databases with their restricted tools. Thus, the theoretical privacy concerns regarding summary omics data are now practically relevant. The ability to link sets of quasi-identifiers can reveal a research participant’s identity and protected health information. Most important, such risks increase over time, activated by new techniques, new knowledge, and new databases. Thus public omics data may become privacy time bombs: safe at the time of distribution, but increasingly likely to compromise personal information.  The need to preserve individuals’ genomic privacy for their lifetime and beyond (for descendants and relatives) poses unique challenges to the effective sharing of high-throughput molecular data. http://compbio.berkeley.edu/poster/200103_StevenSE_Privacy_Poster_v01_2_Feb_20_HZ.pdf