Posterboard # Entry Id Presenting Poster Author First Name Presenting Poster Author Last Name Session/Workshop Area Abstract Type Last Name of First Author Abstract Title List all authors (first name first with names separated by commas and proper capitalization) in the order they appear on the abstract. Please do NOT list affiliations or addresses in this field. Author affiliations (in order of the list of authors). Please separate affiliations with commas. Abstract (300 words or less) Poster DOI or URL
1 238 Yixing Jiang Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Accepted proceedings paper with poster presentation Jiang VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes Yixing Jiang, Jeremy A. Irvin, Andrew Y. Ng,  James Zou Stanford University Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.  
2 225 Milos Vukadinovic Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Accepted proceedings paper with poster presentation Vukadinovic Impact of Measurement Noise on Genetic Association Studies of Cardiac Function Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C. Cheng, Debiao Li, Shoa L. Clarke, David Ouyang Cedars-Sinai Medical Center, University of California Los Angeles, Stanford University Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.  
3 223 Yisu Yang Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Accepted proceedings paper with poster presentation Yang A deep neural network estimation of brain age is sensitive to cognitive impairment and decline Yisu Yang, Aditi Sathe, Kurt Schilling, Niranjana Shashikumar, Elizabeth Moore, Logan Dumitrescu, Kimberly R. Pechman, Bennett A. Landman, Katherine A. Gifford, Timothy J. Hohman, Angela L. Jefferson, Derek B. Archer Vanderbilt Memory and Alzheimer's Center, Vanderbilt University School of Medicine, Nashville, TN, USA, 37212, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA, 37212, Vanderbilt University Institute of Imaging Science, Vanderbilt University Medical Center, Nashville, TN, USA, 37212, Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA, 37212, Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA, 37212, Department of Radiology & Radiological Sciences, Vanderbilt University Medical Center, Nashville, TN, USA, 37212, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA, 37212

The greatest known risk factor for Alzheimer’s disease (AD) is age. While both normal aging and AD pathology involve structural changes in the brain, their trajectories of atrophy are not the same. Recent developments in artificial intelligence have encouraged studies to leverage neuroimaging-derived measures and deep learning approaches to predict brain age, which has shown promise as a sensitive biomarker in diagnosing and monitoring AD. However, prior efforts primarily involved structural magnetic resonance imaging and conventional diffusion MRI (dMRI) metrics without accounting for partial volume effects. To address this issue, we post-processed our dMRI scans with an advanced free-water (FW) correction technique to compute distinct FW-corrected fractional anisotropy (FAFWcorr) and FW maps that allow for the separation of tissue from fluid in a scan. We built 3 densely connected neural networks from FW-corrected dMRI, T1-weighted MRI, and combined FW+T1 features, respectively, to predict brain age. We then investigated the relationship of actual age and predicted brain ages with cognition. We found that all models accurately predicted actual age in cognitively unimpaired (CU) controls (FW: r=0.66, p=1.62x10-32; T1: r=0.61, p=1.45x10-26, FW+T1: r=0.77, p=6.48x10-50) and distinguished between CU and mild cognitive impairment participants (FW: p=0.006; T1: p=0.048; FW+T1: p=0.003), with FW+T1-derived age showing best performance. Additionally, all predicted brain age models were significantly associated with cross-sectional cognition (memory, FW: β=-1.094, p=6.32x10-7; T1: β=-1.331, p=6.52x10-7; FW+T1: β=-1.476, p=2.53x10-10; executive function, FW: β=-1.276, p=1.46x10-9; T1: β=-1.337, p=2.52x10-7; FW+T1: β=-1.850, p=3.85x10-17) and longitudinal cognition (memory, FW: β=-0.091, p=4.62x10-11; T1: β=-0.097, p=1.40x10-8; FW+T1: β=-0.101, p=1.35x10-11; executive function, FW: β=-0.125, p=1.20x10-10; T1: β=-0.163, p=4.25x10-12; FW+T1: β=-0.158, p=1.65x10-14). Our findings provide evidence that both T1-weighted MRI and dMRI measures improve brain age prediction and support predicted brain age as a sensitive biomarker of cognition and cognitive decline.  https://f1000research.com/posters/12-1553
4 239 Carl Yang Digital health technology data in biocomputing: Research efforts and considerations for expanding access Accepted proceedings paper with poster presentation Yang FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis Yi Yang, Han Xie, Hejie Cui, Carl Yang Emory University, Emory University, Emory University, Emory University Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis.  
5 262 David Bonet Overcoming health disparities in precision medicine Accepted proceedings paper with poster presentation Bonet Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations David Bonet, May Levin, Daniel Mas Montserrat, Alexander G. Ioannidis Stanford University, Universitat Politècnica de Catalunya, University of California Santa Cruz Precision medicine models often perform better for populations of European ancestry due to the over-representation of this group in the genomic datasets and large-scale biobanks from which the models are constructed. As a result, prediction models may misrepresent or provide less accurate treatment recommendations for underrepresented populations, contributing to health disparities. This study introduces an adaptable machine learning toolkit that integrates multiple existing methodologies and novel techniques to enhance the prediction accuracy for underrepresented populations in genomic datasets. By leveraging machine learning techniques, including gradient boosting and automated methods, coupled with novel population-conditional re-sampling techniques, our method significantly improves the phenotypic prediction from single nucleotide polymorphism (SNP) data for diverse populations. We evaluate our approach using the UK Biobank, which is composed primarily of British individuals with European ancestry, and a minority representation of groups with Asian and African ancestry. Performance metrics demonstrate substantial improvements in phenotype prediction for underrepresented groups, achieving prediction accuracy comparable to that of the majority group. This approach represents a significant step towards improving prediction accuracy amidst current dataset diversity challenges. By integrating a tailored pipeline, our approach fosters more equitable validity and utility of statistical genetics methods, paving the way for more inclusive models and outcomes.  
6 179 Francisco De La Vega Overcoming health disparities in precision medicine Accepted proceedings paper with poster presentation Rhead Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data Brooke Rhead, Paige E. Haffener, Yannick Pouliot, and Francisco M. De La Vega Tempus Labs, Inc., Chicago, IL, 60654, USA The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods—one heuristic and the other machine learning-based—to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.  
7 254 Inyoung Jun Overcoming health disparities in precision medicine Accepted proceedings paper with poster presentation Jun Quantifying Health Outcome Disparity in Invasive Methicillin-Resistant Staphylococcus aureus Infection using Fairness Algorithms on Real-World Data Inyoung Jun, Sarah E. Ser, Scott A. Cohen, Jie Xu, Robert J. Lucero, Jiang Bian, Mattia Prosperi University of Florida,

University of Florida,

University of Florida,

University of Florida,

University of California Los Angeles,

University of Florida,

University of Florida

This study quantifies health outcome disparities in invasive Methicillin-Resistant Staphylococcus aureus (MRSA) infections by leveraging a novel artificial intelligence (AI) fairness algorithm, the Fairness-Aware Causal paThs (FACTS) decomposition, and applying it to real-world electronic health record (EHR) data. We spatiotemporally linked 9 years of EHRs from a large healthcare provider in Florida, USA, with contextual social determinants of health (SDoH). We first created a causal structure graph connecting SDoH with individual clinical measurements before/upon diagnosis of invasive MRSA infection, treatments, side effects, and outcomes; then, we applied FACTS to quantify outcome potential disparities of different causal pathways including SDoH, clinical and demographic variables. We found moderate disparity with respect to demographics and SDoH, and all the top ranked pathways that led to outcome disparities in age, gender, race, and income, included comorbidity. Prior kidney impairment, vancomycin use, and timing were associated with racial disparity, while income, rurality, and available healthcare facilities contributed to gender disparity. From an intervention standpoint, our results highlight the necessity of devising policies that consider both clinical factors and SDoH. In conclusion, this work demonstrates a practical utility of fairness AI methods in public health settings.  
8 221 Kathleen Cardone Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Accepted proceedings paper with poster presentation Cardone Lymphocyte Count Derived Polygenic Score and Interindividual Variability in CD4 T-cell Recovery in Response to Antiretroviral Therapy Kathleen M. Cardone, Scott Dudek, Karl Keat, Yuki Bradford, Zinhle Cindi,, Eric S. Daar, Roy Gulick, Sharon A. Riddler, Jeffrey L. Lennox, Phumla Sinxadi, David W. Haas, Marylyn D. Ritchie Department of Genetics at University of Pennsylvania, Department of Genetics at University of Pennsylvania, Genomics and Computational Biology Graduate Program at University of Pennsylvania, Department of Genetics at University of Pennsylvania, Department of Genetics at University of Pennsylvania, Division of Clinical Pharmacology and Department of Medicine at

University of Cape Town, Lundquist Institute at Harbor-UCLA Medical Center, Weill Cornell Medicine, University of Pittsburgh, Emory University School of Medicine, Division of Clinical Pharmacology and Department of Medicine at

University of Cape Town, Vanderbilt University Medical Center, Meharry Medical College, Department of Genetics at University of Pennsylvania, Institute for Biomedical Informatics at University of Pennsylvania

Access to safe and effective antiretroviral therapy (ART) is a cornerstone in the global response to the HIV pandemic. Among people living with HIV, absolute CD4 T-cell recovery on ART varies considerably. The contribution of host genetics to this variability is not well understood. We explored whether a polygenic score, derived from summary statistics for absolute lymphocyte count from the general population (PGS-lymph) due to a lack of publicly available summary statistics for CD4 T-cell count. We explored associations with baseline CD4 T-cell count prior to ART (n=4959) and change from baseline to week 48 on ART (n=3274) among treatment-naïve participants in prospective, randomized ART studies of the AIDS Clinical Trials Group. We separately examined an African-ancestry-derived and a European-ancestry-derived PGS-lymph, and evaluated their performance across all participants, and also in the African and European ancestral groups separately. Multivariate models that included PGS-lymph, baseline plasma HIV-1 RNA, age, sex, and 15 principal components (PCs) of genetic similarity explained ~26-27% of variability in baseline CD4 T-cell count, but PGS-lymph accounted for <1% of this variability. Models that also included baseline CD4 T-cell count explained ~7-9% of variability in CD4 T-cell count increase on ART, but PGS-lymph accounted for <1% of this variability. In univariate analyses, PGS-lymph was not significantly associated with baseline or change in CD4 T-cell count. The African PGS-lymph term was significantly associated with CD4 increase in the multivariate model but not in the univariate model. When applied to lymphocyte count in a general medical biobank population (Penn Medicine BioBank), PGS-lymph explained ~6-10% of variability in multivariate models (including age, sex, and PCs) but only ~1% in univariate models. In summary, a lymphocyte count PGS derived from the general population was not consistently associated with CD4 T-cell recovery on ART. Nonetheless, clinical covariates are critical in building polygenic scores.  
9 226 Pei Fen Kuan Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Accepted proceedings paper with poster presentation Huang intCC: An efficient weighted integrative consensus clustering of multimodal data Can Huang, Pei Fen Kuan Stony Brook University, Stony Brook University

High throughput profiling of multiomics data provides a valuable resource to better understand the complex human disease such as cancer and to potentially uncover new subtypes. Integrative clustering has emerged as a powerful unsupervised learning framework for subtype discovery. In this paper, we propose an efficient weighted integrative clustering called intCC by combining ensemble method, consensus clustering and kernel learning integrative clustering. We illustrate that intCC can accurately uncover the latent cluster structures via extensive simulation studies and a case study on the TCGA pan cancer datasets. An R package intCC implementing our proposed method is available at https://github.com/candsj/intCC.

 
10 245 Shefali Verma Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Accepted proceedings paper with poster presentation Kember Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine Rachel Kember, Shefali Verma, Anurag Verma, Brenda Xiao, Anastasia Lucas, Colleen Kripke, Renae Judy, Jinbo Chen, Scott Damrauer, Daniel Rader, Marylyn Ritchie University of Pennsylvania Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.  
11 197 Xi Li Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Accepted proceedings paper with oral presentation Moore SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients Jason H. Moore, Xi Li, Jui-Hsuan Chang, Nicholas P. Tatonetti, Dan Theodorescu, Yong Chen, Folkert W. Asselbergs, Mythreye Venkatesan, Zhiping Paul Wang Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA,

Cedars-Sinai Cancer, Cedars-Sinai Medical Center, Los Angeles, CA,

Department of Cardiology, Amsterdam University Medical Center, Amsterdam, The Netherlands,

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, 
The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.  
12 175 Jeff Ferraro Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Poster only Ferraro Disparities in Clinical Workup of Rare Diseases Pose Challenges for AI: Hereditary Transthyretin Amyloidosis (HATTR) – A Case Study Jeffrey P Ferraro1,2, Catherine Tcheandjieu3,4,5, Craig C Teerlink1,2, Fatal Y Agirl1, Anthony Z Gao1, Julie A. Lynch,1,2  1VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, 84148, USA

2Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, 84108, USA

3Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA

4Department of Epidemiology and Biostatistics, University of California San Francisco, CA, USA

5Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA

Introduction Hereditary Transthyretin Amyloidosis (hATTR) is a rare genetic disorder prevalent in 3-4% of African American (AA) patients due to the causal variant c.424G>A p.Val142Ile. Patients with hATTR experience diagnostic odysseys, delayed and inaccurate diagnoses. Clinicians often do not perform all the necessary diagnostic testing to accurately differentiate hATTR or wild-type amyloidosis from other causes of heart failure. These dynamics make it very difficult for healthcare systems to develop accurate machine learning tools to appropriately identify hATTR patients. In the national VA population of 12,988,563 Veterans, 1,698,422 (13.1%) are AA, only 420 patients have received a clinical diagnosis of hATTR. Yet, ~54,349 AA patients are likely to have the disease causing v122i variant. Underdiagnosis and lack of appropriate diagnostic workup for hATTR, make it very difficult for AA Veterans to benefit from advances in artificial intelligence (AI) and machine learning tools to improve outcomes.



Methods We analyzed medication data, ICD diagnosis codes, procedure codes, and patient lab data from the nationwide VA Health System (1991-2021). Our analysis focused on 121,995 AA patients with genetic information from the Million Veteran Program, in which 3,850 patients are carriers of the p.Val122Ile variant. The top 25 discriminating features were used for a Bayesian Network model to identify patients at risk of the p.Val122Ile variant.



Results Model performance: sensitivity 1.84% (CI 1.44%, 2.32%), specificity 99.97% (CI 99.96%, 99.98%), PPV 65.74% (56.36%, 74.04%), NPV 96.90% (96.89%, 96.91%), accuracy 96.87% (96.77%, 96.97%).



Conclusion The low sensitivity suggests weak discriminative power in available clinical data. Racial disparities in guideline-recommended diagnostic workups and underrepresentation of minority patients in precision medicine interventions will have a compounding impact when healthcare systems rely on AI tools for improvements in future care. Thorough diagnostic workup for rare diseases is essential to provide AI models with required clinical features for optimal performance.

 
13 199 Inbo Han Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Poster only Han Using Natural Language Processing to Identify Low Back Pain in Radiology Reports Inbo Han, Yeji Kim Department of Neurosurgery, CHA University School of Medicine, CHA Bundang Medical Center, Seongnam-si 13496, Republic of Korea, Research Competency Milestones Program of School of Medicine, CHA University School of Medicine, Seongnam-si 13496, Republic of Korea The most common methods to evaluate chronic lower back pain (LBP) are X-radiation (X-ray) examinations, computed tomography (CT), and magnetic resonance imaging (MRI). Many radiology reports use a unstructured free-text form language. It is important to create a pipeline for extracting clinical information from the radiology reports. This study aimed to develop natural language processing (NLP) systems to recognize radiologic findings associated with LBP in X-ray, CT, and MRI radiology reports. Using the rule-based methods and Bidirectional Encoder Representations from Transformers (BERT) model, we evaluated the presence of 23 radiologic findings associated with LBP in radiology reports. Through this research, a NLP system was developed to identify the necessary terms from the free-text form clinical data. Additionally, we investigated the feasibility of using this NLP system for predicting chronic LBP persistence and helping to make surgical decisions.  
14 219 Md Kamruzzaman Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface Poster only Kamruzzaman Enhancing Health Monitoring Precision: Mitigating False Negatives with Smart Wearables and Generative AI Md Kamruzzaman, Jorge Sebastian Salinas, Uma Balakrishnan, Kunal Poorey

Sandia National Laboratory In our pursuit of refined health monitoring, our primary objective is to bolster accuracy by minimizing false negatives—instances where potential health concerns are overlooked. Leveraging advanced wearables such as Smartwatches and Fitbit, our focus extends to real-time health tracking, with a keen emphasis on vital signs, particularly resting heart rate, well-defined within the CDC ranges.

The crux of our work lies in recognizing the far-reaching impact of false negatives in health monitoring—instances where subtle warning signs may go undetected, potentially leading to severe consequences. To address this critical issue, we've developed a lightweight and streamlined algorithm tailored to categorize irregularities in vital signs. Resting heart rate, a pivotal metric outlined by the CDC, takes center stage in our efforts to minimize false negatives.

However, the acquisition of high-quality real-world data for comprehensive health studies poses inherent challenges, marked by resource limitations and biases. To surmount these hurdles, we turn to generative AI, specifically leveraging the Wasserstein Generative Adversarial Network (WGAN). This innovative approach allows us to generate synthetic data that mimics real-world wearable information, creating a diverse and expansive dataset for thorough analysis.

Our methodology revolves around both real and synthetic datasets, strategically navigating the uncertainty associated with anomaly detection thresholds. The amalgamation of cutting-edge wearables and generative AI positions our research at the forefront of reshaping health monitoring paradigms, offering a nuanced and accurate understanding of potential health concerns while significantly reducing false negatives.

As we delve into the intricacies of this research, our unwavering commitment is clear: to advance precision health monitoring by minimizing the occurrence of false negatives. Through this work, we aspire to empower individuals and healthcare systems with a more comprehensive and reliable health assessment tool for any upcoming pandemic.
 
15 195 Sooyeon Lim Digital health technology data in biocomputing: Research efforts and considerations for expanding access Poster only Hye  Clinical implication of gut microbiota and cytokine response for the prognosis of COVID-19

H Seong1,2,3#,  JH Kim4#, Y-H Han5, HJ Hyun1,3, JG Yoon1,3,E Nham1,3, JY Noh1,2,3, HJ Chong1,2,3, WJ Kim1,2,3, SY Lim2,3* and JY Song1,2,3*

1Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea

2Asia Pacific Influenza Institute, Korea University College of Medicine, Seoul, Republic of Korea

3 Vaccine innovation center, Seoul, Republic of Korea

4Department of Internal Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea

5Department of Food and Nutrition, Chungbuk National University, Cheongju, Republic of Korea

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infects luminal cells in the gastrointestinal tract through ACE2 receptors. When SARS-CoV-2 penetrates the ACE2 receptor, which acts as the amino acid transporter, it may cause imbalances in the gut flora. The microbiome regulates the body's immune response through innate immunity and adaptive immunity. Therefore, we investigated whether the microbiome in the early stage of SARS-CoV-2 infection is associated with the prognosis of COVID-19. Thirty patients and sixteen healthy controls participated in this two-hospital cohort study. We collected blood, stool, and clinical records on day 0 (enrollment), 7, 14, and 28. Patients were categorized into four groups, according to their clinical course: improvement from mild (A), improvement from moderate (B), improvement from severe (C), or deterioration (D). When comparing the microbial patterns, there was a significant difference in beta diversity according to the clinical course (p=0.003). During early periods, the change of beta diversity was not remarkable within the first 7 days after the symptom onset (p=0.218). However, as time progressed, the intergroup difference became apparent at days 8-14 (p=0.017). After the 15th day of symptom onset, the difference between groups disappeared again (p = 0.116, at days 15-28; p=0.133, at days 29 and later). With respect to the microbial composition, the groups with clinical improvement (group B,C) and control groups got closer at convalescent stage, while group D, whose clinical course worsened, was still far from the control group. In group A, the intergroup distance maintained far from the control group despite the clinical improvement, which might be due to the underlying medical conditions. The microbial beta diversity was significantly distinguished by the Charlson comorbidity index also (p=0.001). In patients with COVID-19, the microbiome composition might be affected by the clinical course and comorbidities.

 
16 180 Patrick Lawrence Drug-repurposing and discovery in the era of “big” Real-World Data: Accepted proceedings paper with oral presentation Xiang Modeling Path Importance for Effective Alzheimer’s Disease Drug Repurposing Shunian Xiang, Patrick J. Lawrence, Bo Peng, ChienWei Chiang, Dokyoon Kim, Li Shen, Xia Ning The Ohio State University, The Ohio State University, The Ohio State University, The Ohio State University, University of Pennsylvania, University of Pennsylvania, The Ohio State University Recently, drug repurposing has emerged as an effective and resource-efficient paradigm for AD drug discovery. Among various methods for drug repurposing, network-based methods have shown promising results as they are capable of leveraging complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Thus, relying on this assumption may be deleterious to drug repurposing attempts. In this work, we propose MPI (Modeling Path Importance), a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network’s rich structural information. Thus, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data aid us in identifying the use of etodolac, nicotine, and BBB-crossing ACE-INHs as having a reduced risk of AD, suggesting such drugs may be viable candidates for repurposing and should be explored further in future studies.  
17 172 Fatai Agiri Drug-repurposing and discovery in the era of “big” Real-World Data: Poster only Teerlink The effect of thiazolidinediones on survival of diabetes patients with solid tumors Craig C. Teerlink, Tyler J. Nelson, Julie A. Lynch, Brent S. Rose, Kyung Min Lee, Richard L. Hauger VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, 84148, USA., VA San Diego Healthcare System, La Jolla, CA, USA., VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, 84148, USA., VA San Diego Healthcare System, La Jolla, CA, USA., VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, 84148, USA., VA San Diego Healthcare System, La Jolla, CA, USA.

Introduction:   Previous research has suggested thiazolidinediones (TZDs), used to manage type 2 diabetes mellitus, may have effects on survival rates of patients with solid tumors. We examined survival rates for patients with solid tumors who received TZD treatment for diabetes in a large, nation-wide healthcare system.



Methods: We processed medication data and phecode disease classifications derived from the nation-wide Veterans Administration Health Record System from 1991 to 2021. We identified 209,012 patients with at least two diabetes diagnoses occurring on different dates who had no cancer diagnoses before their first diabetes diagnosis. From these, we identified 36,263 subjects with exposure to TZDs. Among the diabetes cohort, we identified subjects with at least two diagnoses of solid tumors including bladder (n= 5,272), colorectal (n=7,037), lung (n=6,775), prostate (n=15,950), and thyroid (n=699. We used multivariable Cox proportional hazards regressions to measure associations between TZD use and incidence   of different cancers. TZD use was modeled as a time-varying covariate from the first to last prescription of TZD medications, and analyses were adjusted for age, race, ethnicity, sex, and body mass index.



Results: Long-term exposure to TZD was significantly associated with increased incidence of prostate cancer (HR=1.23, p<0.001), decreased incidence for lung (HR=0.59, p<0.001), bladder (HR=0.50, p<0.001), and colorectal cancer (HR=0.85, p=0.01), and non-significant association for thyroid cancer (HR=0.97, p=0.87).



Conclusions: Decreased incidence of several solid tumors (lung, bladder, and colorectal) indicate that TZDs may be strong candidates for drug repurposing strategies These results warrant replication attempts in external datasets.

 
18 178 Yonghyun Nam Drug-repurposing and discovery in the era of “big” Real-World Data: Poster only Nam Integrative Network-based Drug Scoring for Precision Drug Repurposing: Leveraging heterogeneous biological inferences through graph-based semi-supervised learning Yonghyun Nam, Sang-Hyuk Jung, Xia Ning, Li Shen, Dokyoon Kim University of Pennsylvania, University of Pennsylvania, The Ohio State University, University of Pennsylvania, University of Pennsylvania As our understanding of biological mechanisms advances, more precise computational drug repurposing has been possible through well-curated heterogeneous knowledge networks. However, incorporating heterogeneous networks in drug repurposing poses challenges in terms of scalability and interpretability.  To address this problem, we propose a comprehensive network-based drug scoring algorithms that incorporate both data-driven and knowledge-driven networks, achieved through graph-based semi-supervised learning, enabling its application to drug repurposing. This method transfers information from data-driven networks to knowledge-driven networks, thereby incorporating their complex interactions across networks. The disease-disease network was constructed using UK Biobank PheWAS summary data for the data-driven network, connecting diseases based on their shared genetic variants. Additionally, the drug-target protein multi-layered network was constructed by collecting drug-target protein and protein-protein interaction information from publicly available databases for the knowledge-driven network. While data- and knowledge-driven networks were not directly connected, we transfer disease-disease association information to the drug-target protein network through graph-based semi-supervised leaning. In this process, protein-protein interaction networks serve as the input network, and drug-drug networks serve as the output network. After transferring disease information, the predicted scores on the output drug-drug network incorporate all heterogeneous networks through latent inference. To evaluate the predicted drug scores, we compared them with known disease-drug associations. As a proof-of-concept-study, we applied the drug scoring to rheumatoid arthritis, asthma and multiple sclerosis. Overall, the average AUC of the proposed methods was 0.701, indicating a 9.71% improvement compared to the average AUC of 0.639 in conventional multi-layered networks. Our comprehensive framework provides a valuable tool for identifying new therapeutic opportunities through drug repurposing, contributing to efficient and cost-effective drug discovery efforts. This approach has the potential to revolutionize drug repurposing research and advance personalized medicine for diverse diseases.   
19 236 Kunal Poorey Drug-repurposing and discovery in the era of “big” Real-World Data: Poster only Poorey MIRA: Machine Intelligence for Rapid Acceleration of Drug Discovery and Repurposing Kunal, Poorey

Md, Kamruzzaman

Sandia National Labs

Sandia National Labs
A continuous healthy stockpile, uninterrupted and reliable sources of drugs, and the ability to develop new drugs to encounter new and emerging diseases are essential to support the nation's healthcare response capability. Supply chain issues, propriety formulations, drug resistance, and lack of treatment due to single-source raw materials or the products itself can hinder access to proper medical countermeasures. Machine learning (ML) aided drug discovery significantly reduces bottlenecks by making it faster and cost-effective. By using extensive datasets, ML algorithms identify potential drugs, predict interactions with biological targets, and optimize chemical properties for effective manufacturing. Additionally, new drug development also aids in improving treatment efficacy and reducing development costs. Further, "explainable" machine learning (XML) enhances our understanding of structure-property relationships, aiding in the optimized design of effective and safer drugs by understanding model recommendations, identifying new drug targets, and predicting side effects.



Here, we present a data-driven predictive model, MIRA (Machine Intelligence for Rapid Acceleration of Drug Discovery and Repurposing), a predictive model for accelerated drug discovery. Utilizing publicly available databases, we've compiled a database of drug compounds and their properties, such as toxicity, solubility, and permeability. MIRA employs cheminformatics to predict the properties of both existing and synthetic structures, offering ranked alternatives for drug compounds for repurposing and elucidating structural-property relationships and designing new ones. This approach accelerates experimental testing and validation by in-sillico screening and reducing the experimental burden for validation. Additionally, we present the capability of applying generative models based on large language models for generating novel alternative drug-like molecules. These combined approaches pave a promising future in more accurate and informed drug discovery.

 
20 202 Philip Freda Overcoming health disparities in precision medicine Accepted proceedings paper with oral presentation Orlenko Cluster Analysis reveals Socioeconomic Disparities among Elective Spine Surgery Patients. Alena Orlenko, Philip J. Freda, Attri Ghosh, Hyunjun Choi, Nicholas Matsumoto, Tiffani J. Bright, Corey T. Walker, Tayo Obafemi-Ajayi, Jason H. Moore Cedars-Sinai Medical Center, Department of Computational, Biomedicine Los Angeles, California, USA

Cedars-Sinai Medical Center, Department of Neurosurgery, Los Angeles, California, USA

Missouri State University, Engineering Program, Springfield, Missouri, USA
Bias in clinical environments introduced by socioeconomic factors frequently leads to suboptimal surgical outcomes, disproportionately affecting the economically disadvantaged and underrepresented racial minorities. In this study, we showcase how cluster analysis can pinpoint socioeconomic subpopulations within surgical cohorts at a heightened risk of poor outcomes. Our methodology employs an automated clustering process which compares results across seven algorithms and selects the most optimal clustering output. We validate key features driving cluster results using the automated machine learning package, TPOT. Across clustering approaches, we consistently identify two main clusters driven by insurance type (commercial vs. Medicare). Further exploration reveals that patients in the Medicare group tend to have poorer overall health and medical histories. Additionally, within these groups, poorer outcomes are more prevalent for African American patients, irrespective of insurance type. These findings suggest that data-driven stratification could guide the design of machine learning models that are fair and exhibit minimal bias.  
21 187 Charleston Chiang Overcoming health disparities in precision medicine Poster only Dinh A reference panel to improve genotype imputation for Native Hawaiians and Japanese Americans Bryan L. Dinh, Xinran Wang, Xin Sheng, Echo Tang, Fei Chen, Erica Young, Kekoa Taparra, Stephane E. Castel, Tony R. Merriman, Lynne R. Wilkens, Loïc Le Marchand, Ira Hall, Nathan Stitziel, Christopher A. Haiman, and Charleston W. K. Chiang University of Southern California, University of Southern California, University of Southern California, University of Southern California University of Southern California, Washington University St. Louis, Washington University St. Louis,

Stanford University, Variant Bio Inc., University of Otago and University of Alabama at Birmingham, University of Hawai‘i, University of Hawai‘i, Yale University,  Washington University St. Louis, University of Southern California, University of Southern California

Imputation is now a fundamental component to conduct powerful association studies. While public imputation reference panels have increased their size, representation, and accessibility, the accuracy and effectiveness of imputation remain low among global populations whose ancestries are poorly represented in panels such as TOPMed. We created an imputation reference panel comprised of 10,721 whole-genome sequenced individuals from the Multiethnic Cohort (MEC), with multi-ancestry representations from five ethnic groups: African Americans (1,270), Native Hawaiians (1,065), Japanese Americans (3,105), Latinos (4,141), and non-Hispanic whites (1,140). In total, the reference panel contains 82M SNPs and 5M indels passing quality control. Despite being approximately 10% the size of TOPMed, we find that our reference panel outperforms TOPMed for populations not well-represented such as Eastern Polynesians (e.g. Native Hawaiians, N = 3,276) and East Asians (e.g. Japanese Americans, N = 4,176). For example, for alleles with 1-5% frequency, the mean imputation Rsq for Native Hawaiians and Japanese Americans are 0.921 and 0.908, respectively, compared to 0.886 and 0.751 when imputed with TOPMed. We also observe that the MEC panel imputes an externally validated set of Western Polynesian-enriched variants with higher accuracy, compared to TOPMed. These improvements suggest that our reference panel may also benefit other Pacific Island populations enriched with Polynesian ancestries in general. Lastly, we find that meta-imputation approach that combines the results of both the MEC and TOPMed imputations can further improve imputation accuracies. Overall, as this improvement due to increased representation does not require sequencing a prohibitive number of individuals, our study highlights the urgent need to generate genomic resources that will enable better association studies for underserved populations across the globe. https://drive.google.com/file/d/1mp-lo35jVHLy6af0tOw9mjbM3ESriIO2/view?usp=sharing
22 192 Jibril Hirbo Overcoming health disparities in precision medicine Poster only Hirbo The cost of inappropriate prediction algorithms on the health of minority individuals in large care-based Electronic Health Record Jibril Hirbo1,2, Neha Arora1, Peter Straub1,2, Kangcheng Hou3, & Nancy Cox1,2 1Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, United States

2Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, United States

3Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA

Individuals of African ancestry in the US bear significant and disproportionate health burdens, as demonstrated by most measures of health. Delivering high-quality, efficient, and equitable healthcare to all Americans require clinical systems that use appropriate models capturing the full complexity of diverse US populations. Clinical laboratory tests are at the heart of the some of the most used algorithms for diagnosing chronic diseases, and assessing disease progression and treatment response, and clinicians take or recommend specific clinical follow-up if values fall outside predefined reference intervals (RI) – that were developed years — often decades — based on information from small samples of European ancestry individuals. The burden of RIs that fail to capture the diversity of populations in our healthcare systems fall disproportionately on US minorities, and costs incurred for the consequent repeat testing, under- and misdiagnosis, and under-/over-treatment of disease have never been systematically estimated. We performed a comprehensive analysis of population differences in clinical values and quantified consequent disparities in clinical follow-up between individuals of European and African ancestries in Vanderbilt University Medical Centers biobank-BioVU (n=3.4 million). Out of >1000 clinical values we analyzed 709 clinical laboratory values with ≥200 of individuals both ancestries and identified 306 (>40%) that show systematic mean differences in levels. We use Neutrophils and Vitamin D as a canonical examples of these clinical laboratory values to show the role of genetics and other non-genetic factors in explaining the systematic differences observed. We then used prediction models that incorporates common, rare genetic variations and demographic parameters to refine individual levels on same scale across the two ancestries, to facilitate its clinical implementation. Our results have potential influence on health equity and will inform appropriate stakeholders on possible options for reducing inequities created by RIs poorly matched to the diverse populations.  
23 206 Vincent Lam Overcoming health disparities in precision medicine Poster only Lam Genetic Ancestry and Socioeconomic Deprivation Interact to Influence Type 2 Diabetes Risk in the All of Us Cohort Vincent Lam, Shivam Sharma, Sonali Gupta, John Spouge, King Jordan, Leonardo Mariño-Ramírez National Institute on Minority Health and Health Disparities, Georgia Institute of Technology, National Center for Biotechnology Information Diabetes is a disease with high prevalence in the United States with negative consequences on both health and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of diabetes cases in the country and disproportionately affects those identifying as either Black or Hispanic and those of high socioeconomic deprivation (SED). The NIH All of Us program, which has developed a diverse population biobank, provides rich opportunities to study how racial and ethnic identity, genetic ancestry, and SED interact to affect T2D risk. Data from the program was used to construct the study cohort. Metrics at the level of the individual (iSDI) and zip code (zSDI) were used to quantify SED. The All of Us Researcher Workbench was used to calculate T2D prevalence estimates and to model T2D risk as a function of varying combinations of self-identified race and ethnicity (SIRE), genetic ancestry, iSDI, and zSDI. The study cohort spans 86,488 participants from the four largest SIRE groups in All of Us: Asian (n=2311), Black (n=16,282), Hispanic (n=16,966), and White (n=50,292). The Black and Hispanic SIRE groups have the highest T2D prevalence estimates and the highest average SED based on zSDI. Consistent with prior studies, SED was found to be generally associated with T2D risk. However, higher SED was associated with lower T2D risk within the Black and Hispanic subgroups, running contrary to a similar study we conducted using data from the U.K. Biobank, where membership to a minority group interacted with SED to amplify T2D risk. These findings may be indicative of the effects of the healthy immigrant effect or limitations of the zSDI metric. Furthermore, the contribution of membership to the Black and Hispanic SIRE groups to T2D risk is reduced when controlling for SED, suggesting that SED drives T2D disparities.  https://f1000research.com/posters/12-1530
24 176 Joon-Yong An Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Kim CWAS-Plus: Estimating category-wide association of rare noncoding variation from whole-genome sequencing data Yujin Kim1,2,3,†, Minwoo Jeong4,†, In Gyeong Koh1,2,3, Hyeji Lee1,2,3, Jae Hyun Kim1,2,3, Ron Yurko5, Donna Werling6,7, Stephan Sanders8,9,10, Joon-Yong An1,2,3,4,* 1 Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea

2 Transdisciplinary Major in Learning Health Systems, Department of Healthcare Sciences, Graduate School, Korea University, Seoul, 02841, Republic of Korea

3 BK21FOUR R&E Center for Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea

4 School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea

5 Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA

6 Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA

7 Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA

8 Department of Psychiatry and Behavioral Sciences, Weill Institute for Neuroscience, University of California, San Francisco, CA, 94143, USA

9 Institute for Human Genetics, University of California, San Francisco, CA, 94158, USA

10 Institute for Developmental and Regenerative Medicine, Old Road Campus, Roosevelt Dr, Headington, Oxford, OX3 7TY, UK

† Joint Authors
The noncoding genome contains regulatory elements that play a critical role in human development. Due to advancements in whole genome sequencing (WGS) technologies, we are now able to explore mutations within these regulatory elements. In the meantime, the recently accumulated single-cell data allows the investigation of cell-type-specific regulatory elements, facilitating the identification of cell-type-specific noncoding mutations associated with diseases. To perform a genome-wide evaluation of noncoding mutations using WGS data, an analytic framework that enables fast and easy integration of diverse functional annotations to the WGS data and empowers multiple testing comparisons is essential. Our study aims to develop CWAS-Plus, a statistical framework to perform a category-wide association test for noncoding variants and provides an efficient analysis of genome-wide noncoding associations. CWAS-Plus conducts genome-wide assessment of noncoding associations using WGS data by integrating functional annotation datasets, including cell-type-specific enhancers and promoters. Variants are categorized into functional annotation combinations, referred to as categories in CWAS-Plus, allowing an enrichment test for qualifying variants. For multiple testing comparison, we computed an effective number of tests based on correlations between the categories. CWAS-Plus explores of relationships between noncoding categories and disease risk through network analysis. To evaluate the performance of CWAS-Plus, a thorough assessment was conducted using WGS data obtained from 1,991 families with autism spectrum disorder (ASD). We developed CWAS-Plus, a fast and user-friendly Python package for efficiently detecting risks associated with diverse functional annotations through effective multiple hypotheses testing (https://cwas-plus.readthedocs.io/en/latest/). From annotation to noncoding association testing, CWAS-Plus can process WGS data from approximately 4,000 individuals, along with various functional annotations, including single-cell epigenome data, within 2 hours. Our findings successfully identified noncoding categories enriched for ASD, particularly highlighting regulatory elements of excitatory neurons. Hence, we present CWAS-Plus as an analytic framework applicable for investigating diverse genomic disorders in future studies.

 
25 229 Kord Kober Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Kober Differential Methylation of Expression-Associated CpG Loci in an Enhancer Region is Associated with Morning Cancer-Related Fatigue Severity Caroline Lee, Maureen Lewis, Liam Berger, Ritu Roy, Sue Yom, Nam Woo Cho, Adam Olshen, Christine Miaskowski, Kord M. Kober University of California San Francisco School of Nursing, University of California San Francisco School of Nursing, 

University of California San Francisco School of Nursing, University of California San Francisco Helen Diller Family Comprehensive Cancer Center, University of California San Francisco School of Medicine, University of California San Francisco School of Medicine, University of California San Francisco Helen Diller Family Comprehensive Cancer Center,

University of California San Francisco School of Nursing, University of California San Francisco School of Nursing
Cancer-related fatigue (CRF) is the most common symptom associated with cancer and its treatment. A lack of knowledge of the underlying mechanisms limits the development of efficacious interventions. By utilizing multiple types of complementary omics data (i.e., transcriptomics, epigenomics), we aim to identify molecular characteristics associated with morning CRF (AM CRF) severity to increase understanding of the mechanisms underlying AM CRF. The first aim of this study is to identify expression-associated (eCpG) loci in one group oncology patients receiving CTX. Our second aim is to evaluate for differential methylation of distal regions of eCpG loci (identified in aim 1) associated with AM CRF severity in an independent group. Patients completed questionnaires during the week prior to the next treatment. AM CRF severity was evaluated using the Lee Fatigue Scale. RNA and DNA methylation were measured in peripheral blood of two distinct groups of oncology patients (n=115; n=584). Distal eCpG’s (n=25,310) were identified using eQTM mapping in the first group. Differential methylation of these distal eCpGs was evaluated in the second group of patients (n=224 High, n=360 Low AM CRF). The final model included significant demographic and clinical characteristics and three surrogate variables. Differentially methylated distal regions were identified using a sliding window approach. Using Fisher’s method, one region was found to be significantly differentially methylation (FDR < 0.0002). CpGs in this region mapped to the expression of three genes (ONECUT1, CLIP, and SLCO3A1). ONECUT1 is a transcription factor involved in gene transcription, glucose metabolism, and cell cycle regulation. Genetic variation in SLCO3A1 is associated with the occurrence with chronic fatigue syndrome. These findings can be useful to identify novel therapeutic targets. These methods may be generalizable to other symptoms and patient-reported outcomes associated with cancer (e.g., sleep disturbance, depression, anxiety, dyspnea, pain) and its treatment (e.g., combined CTX and radiation). https://f1000research.com/posters/12-1558
26 233 Eunjung Alice Lee Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Zhao Whole genome retroelement analyses reveal diagnostic and therapeutic opportunities in genetic disorders Boxun Zhao, Minh A. Nguyen, Sijae Woo, Jinkuk Kim, Arthur S. Lee, Elizabeth C. Engle, Vijay S. Ganesh, Anne O'Donnell-Luria, Alan H. Beggs, Timothy W. Yu, Eunjung Alice Lee Boston Children’s Hospital, Boston and Harvard Medical School, Boston Children’s Hospital, Boston and Harvard Medical School, Korea Advanced Institute of Science and Technology, Korea Advanced Institute of Science and Technology, Boston Children’s Hospital, Boston and Harvard Medical School, Boston Children’s Hospital, Boston and Harvard Medical School, Broad Institute of MIT and Harvard, Boston Children’s Hospital, Boston and Harvard Medical School, Boston Children’s Hospital, Boston and Harvard Medical School, Boston Children’s Hospital, Boston and Harvard Medical School, Boston Children’s Hospital, Boston and Harvard Medical School





Certain classes of genetic variation still escape detection in clinical sequencing analysis. One such class is retroelement insertion, which has been reported as a cause of Mendelian diseases and may offer unique therapeutic implications. To understand the contribution of retroelements to genetic diseases, we analyzed whole-genome sequencing (WGS) data from 237 individuals with ataxia-telangiectasia (A-T). We identified 15 patients bearing retroelement insertions in the causative ATM gene, all but one of which landed in noncoding regions, highlighting the advantages of WGS over exome sequencing. Notably, 13 (~5.5%) patients carried one of five distinct pathogenic insertions. We found one exonic Alu insertion, as well as three intronic Alus that landed in close proximity (<50 nt) to exon-intron boundaries; the latter events led to different levels of exon skipping as validated by RNA sequencing, RT-PCR, and/or minigene splicing assays. Beyond Alu insertions, we also resolved a deep intronic DUSP16 pseudogene insertion, which resulted in loss of ATM function by activating cryptic splice sites. Together with other case collections, we also identified de novo L1Hs and SVA insertions as the disease-causing variants. Notably, some apparently de novo insertions have their origin of parental mosaicism and could be detected from blood sequencing data.



The discovery of splice-altering insertions may represent therapeutic opportunities with splice-switching antisense oligonucleotide (ASO). For the DUSP16 insertion, we developed proof-of-concept ASOs that suppress cryptic exonization, supporting the experimental amenability of some of splice-altering insertions with RNA-based ASO intervention. Leveraging large patient cohorts, we identified cases with all types of active retroelements (L1Hs, Alu, SVA, and pseudogene insertion). We provided an initial estimate of the contribution of retroelements to the genetic architecture of recessive Mendelian disorders as 5.5% (13/237 cases). Our study underscores the importance of retroelement insertions as an underexplored source of pathogenic genetic variation and therapeutic opportunities.

https://drive.google.com/file/d/1lsiRUAeKip5rBItc2RdJobCZE5DAkMw8/view?usp=drive_link
27 224 Tyler Nelson Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Nelson Impact of germline HSD3B1 genetic variant 1245C on development of Alzheimer's disease and related dementia diagnoses among Veterans diagnosed with prostate cancer Tyler J. Nelson, Craig Teerlink, Kathryn Pridgen, Mark W. Logue, Rui Zhang, Brent Rose, Julie A. Lynch*, Richard L. Hauger*   *Co-senior authors contributing equally Tyler J. Nelson: VA Informatics and Computing Infrastructure VA Salt Lake City Health Care System, Salt Lake City, UT; Department of Radiation Medicine and Applied Sciences, University of California San Diego, La Jolla, CA

Craig Teerlink: VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT; Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT

Kathryn Pridgen:  VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT; Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT

Mark W. Logue: National Center for PTSD, Behavioral Sciences Division, VA Boston Healthcare System, Boston, MA; Boston University Chobanian & Avedisian School of Medicine, Department of Psychiatry, Boston, MA

Rui Zhang: National Center for PTSD, Behavioral Sciences Division, VA Boston Healthcare System, Boston, MA

Brent Rose: Department of Radiation Medicine and Applied Sciences, University of California San Diego, La Jolla, CA; Veterans Affairs San Diego Healthcare System, San Diego, CA

Julie A. Lynch:  VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT; Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT; Department of Nursing and Health Sciences, University of Massachusetts, Boston, Boston, MA

Richard L. Hauger: Veterans Affairs San Diego Healthcare System, San Diego, CA; Center for Behavioral Genetics of Aging, University of San Diego, La Jolla, CA

Introduction



Research suggests that testosterone may be protective against Alzheimer's disease and related dementias (ADRD). The HSD3B1(1245C; rs1047303) homozygous rare genotype (CC) leads to increased androgen synthesis compared to the wildtype. We explored the association between this genotype and ADRD by evaluating rates of ADRD outcomes by HSD3B1 variant status in Veterans receiving care through the VA healthcare system (VHA).



Methods



We included Veterans in the Million Veteran Program with prostate cancer with available baseline survey and genetic information. We excluded patients with <10 total VHA visits or who were <60 years old upon MVP enrollment. Development of ADRD was measured as a combined endpoint of multiple AD and dementia diagnosis codes, with at least two diagnosis codes required. Time to event was defined as time from MVP enrollment to ADRD development or censor. We compared baseline characteristics using chi-square tests. We used Cox proportional hazards regressions to measure associations between HSD3B1 genotypes and outcomes, adjusting for age, Charlson Comorbidity Index, Harmonized Ancestry and Race/Ethnicity (HARE), income, education, and APOE4 carrier status (rs429358 and rs7412).



Results



8,440 patients were homozygous AA (AA), 6,447 were heterozygous (AC), and 1,342 were homozygous CC (CC). Median follow up was 7.37 years. The CC genotype was more common in European ancestry patients (10.7%) vs African (1.4%), Asian (0.0%), or Hispanic (5.1%) ancestry patients (p<0.001). Cumulative incidence of ADRD at five years was 3.5%, 3.6%, and 2.7% in AA, AC, and CC groups respectively (p=0.10). Multivariable Cox regressions showed decreased risk of ADRD in patients with CC genotype (hazard ratio: 0.74, p=0.04).



Conclusions



The HSD3B1 CC genotype is associated with decreased risk of ADRD development in this cohort of prostate cancer patients when controlling for other factors. Our results support the hypothesis that the HSD3B1 CC genotype may confer differential testosterone regulation, producing physiological changes. 
 
28 177 Jason Sa Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Sa Comprehensive molecular characterization of Korean advanced pan-cancer patients facilitates precision medicine Jason K. Sa Department of Biomedical Sciences Korea University College of Medicine The fundamental principle of precision oncology is centralized on the identification of therapeutically exploitable targets that provide individual patients an opportunity to make informed decisions on a personalized level. To facilitate and adopt such concepts within clinical practice, several large-scale genomic studies have been initiated to identify and explore essential molecular aberrations and their functional impacts across a wide spectrum of different tumor types. However, as the vast majority of the patients enrolled in these studies have originated from European Ancestry, several limitations prevent the implementation of such profound insights for treating East Asian cancer patients. To address such challenges, we have collected and explored the complex genome of 4,028 Korean advanced pan-cancer patients. Considerable levels of genomic diversity existed at both pan-cancer and individual tumor levels between patients from different ethnic origins; for example, mutations in key chromatin remodeling genes, including IDH1 were only observed in cholangiocarcinoma patients of European origin, while Korean cancer patients were marked by recurrent ablations in KRAS and TP53. Furthermore, Korean patients were characterized by mutations in mismatch repair (MMR) encoding molecules, which subsequently led to increased MMRd mutational signature activities. Lastly, we provided a clinical proof-of-concept where patients who carried a PIK3CA-activating mutation with liver metastasis demonstrated a remarkable response to PI3K-mediated therapy. Our results collectively highlighted the significance of employing an ethnic-based personalized approach in cancer therapy.  
29 191 Matt Wright Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Wright Batch ClinVar submission in ClinGen's Variant Curation Interface (VCI) Matt W. Wright, Christine G. Preston, Gloria Cheung, Mark E. Mandell, Bryan Wulf, Lawrence Babb, Marina DiStefano, Steven M. Harrison,Clarissa J. Klein, Rachel Shapira, Ingrid M. Keseler, Deborah I. Ritter, Neethu Shah, Kevin Riehle, Aleksandar Milosavljevic, Sharon E. Plon, Teri E. Klein Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA,

Molecular and Human Genetics Department, Baylor College of Medicine, Houston, TX,

Medical and Population Genetics, Broad Institute of MIT & Harvard, Cambridge, MA,

Precision Health Program, Geisinger, Danville, PA,

Pediatrics-Hematology-Oncology, Baylor College of Medicine, Houston, TX

The NIH-funded Clinical Genome Resource Consortium Variant Curation Interface (ClinGen VCI) is a global, open-source variant classification platform for supporting the application of evidence criteria and classification of variants based on the ACMG/AMP sequence variant classification guidelines. To facilitate evidence-based improvements in human variant classification, the VCI is publicly available to the worldwide genomics community. The VCI is among a suite of tools developed by ClinGen, and supports the FDA-recognized human variant curation process of ClinGen Variant Curation Expert Panels (VCEPs). ClinGen is expanding to involve more curators and teams of curators (affiliations) working today as part of an increasing scale of activities that will increase genetic variant curations across a greater number of genes. The variant curation workflow is intended to support dissemination of variant curations into two repositories: the ClinGen Evidence Repository (for approved ClinGen VCEPs) and ClinVar.  Here we present the first in a series of planned software features to provide a better ClinVar submission experience for VCI users. This batch submission feature allows users to create an active batch of curated variants and then download a preformatted file of the curation data, which can easily be submitted to ClinVar. This provides a faster process for users submitting variant interpretations to ClinVar. It also allows VCI users to annotate which of their curations have been submitted to ClinVar. Future feature plans for ClinVar submission from the VCI include providing VCI users with real-time feedback on the status of their ClinVar submissions, as well as direct API-based submissions from the VCI. These features together will further streamline workflows for both ClinGen VCEPs and non-ClinGen VCI users, and further the ClinGen goal of creating scalable curation workflows to support the clinical genomics community.

https://doi.org/10.7490/f1000research.1119652.1
30 205 Alison Ziesel Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease Poster only Ziesel Characterization of synonymous mutation impact on SARS-CoV-2 genomic organization Alison Ziesel, Hosna Jabbari University of Alberta, University of Alberta The spread of SARS-CoV-2 and associated covid-19 infection will continue to be a problem for the foreseeable future. Much work has been done to characterize emerging variants in terms of their protein-altering mutations, especially Spike protein-altering mutations, but less attention has been given to non-protein-altering synonymous mutations. In this work, six Variants of Concern have been assessed for their non-synonymous and synonymous mutational content, and the effects of those mutations on potential genomic RNA secondary structure have been predicted. We find that non-synonymous or protein-altering mutations are more prevalent than synonymous mutations in these variants, including within protein coding regions, and that synonymous mutations typically have modest impact on RNA secondary structure. We argue that this indicates the importance of RNA structure, including within coding regions, to viral fitness.  
31 190 Tina Hernandez-Boussard Workshop: Large Language Models (LLMs) and ChatGPT for Biomedicine Poster only Hernandez-Boussard Few-shot Learning with Large Language Models to Extract Fall Events for Older Americans from Clinical Narratives Malvika Pillai, Joe  Studnia, Ines Dormey, Catherine Curtin, Tina Hernandez-Boussard Stanford University, School of Medicine Falls pose a significant threat to the well-being of older Americans, with increasing mortality rates and substantial economic burdens. Post-surgical falls offer a crucial window for targeted intervention, yet identifying at-risk individuals remains challenging. This study addresses this gap by leveraging large language models, including BioClinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT), Large Language Model Meta AI (LLaMA), and Generative Pre-trained Transformer for Biomedical Text Generation and Mining (BioGPT). We compare their performance in extracting fall events from clinical narratives, crucial for pre-surgical risk assessments. Our models were fine-tuned on a sample of annotated notes, and their deployment on a comprehensive patient cohort demonstrated promising results. LLaMA-7B exhibited high precision, while BioClinicalBERT demonstrated superior recall and area under the receiver operating characteristic curve (AUROC). This innovative approach, utilizing readily available language models, paves the way for scalable and accurate fall event extraction from clinical notes. The findings not only contribute to enhancing post-surgical fall risk assessments but also shed light on the potential of few-shot learning in healthcare applications.  
32 204 Sanya Taneja Workshop: Large Language Models (LLMs) and ChatGPT for Biomedicine Poster only Taneja Information Extraction from Biomedical Texts using Large Language Models for Natural Product-Drug Interactions Sanya B. Taneja, Sonish Sivarajkumar, Yanshan Wang, Richard D. Boyce University of Pittsburgh, University of Pittsburgh, University of Pittsburgh, University of Pittsburgh Transformer-based large language models (LLMs), such as BERT, GPT, and Llama2 models, are the current state-of-the-art for named entity recognition (NER) and relation extraction. LLMs can be used for information extraction beyond NER and relation extraction, including context associated with events in literature-based discovery and aggregation of data from clinical studies. We aim to evaluate the application of LLMs to extract important elements from clinical studies (e.g., Population, Comparator, Outcome, Results) and drug pharmacokinetic parameters (e.g., half-life, area under concentration-time curves, plasma concentration, and inhibition constant) from unstructured text and tables in biomedical studies. In preliminary experiments, we applied the PubMedBERT model on annotated sentences from drug-drug interaction studies to identify these key parameters. We then evaluated the performance on a hold-out test set. We compared the performance of PubMedBERT from HuggingFace to GPT-3.5 and GPT-4 models (OpenAI API) with zero-shot prompting on the same test data. The PubMedBERT model, fine-tuned with 208 sentences, achieved a 92.5% accuracy and 0.70 F1-score with 53 sentences in the hold-out test set. PubMedBERT can extract multiple types of parameters with a single model. The model achieves better performance with entity tagging method than with multi-label classification for prediction. Conversely, GPT-3.5 achieved 54.72% accuracy on the same test set with frequent errors in parameter type hallucination. GPT-4 achieved 77.36% accuracy with less frequent hallucinations, but GPT models struggled to accurately identify parameter spans in sentences. Although GPT-4 can retrieve relevant information from limited context, manual review will be required when using it for information extraction. Future experiments will explore information extraction from tables, models such as Llama2, additional parameters (e.g., dosage), and aim to enhance PubMedBERT’s performance with annotated data from LLMs with larger context window and different prompting strategies, with the goal of improving information extraction from natural product-drug interaction studies. https://sanyabt.github.io/files/talks/PSB_posterv1.pdf
33 209 Xiangru Tang Workshop: Large Language Models (LLMs) and ChatGPT for Biomedicine Poster only Tang BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein

Yale University, Yale University, Yale University, Yale University, Google DeepMind, Yale University Pre-trained large language models have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of specialized domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate large language models (LLMs) in generating bioinformatics-specific code. BioCoder spans a broad spectrum of the field and covers cross-file dependencies, class declarations, and global variables. It incorporates 1026 Python functions and 1243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling we show that overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. Furthermore, we finetuned StarCoder, demonstrating how our dataset can effectively enhance the performance of LLMs on our benchmark (by >15% in terms of Pass@K in certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> ~2600 tokens) with full context, for functional dependencies. (2) They contain specific domain knowledge of bioinformatics, beyond just general coding knowledge. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on the benchmark (50% vs up to ~25%).

https://arxiv.org/abs/2308.16458
34 188 John Kwagyan Workshop: Risk prediction: Methods, Challenges, and Opportunities Poster only Kwagyan "Development of Predictive Models for Ophthalmologic Data Structures with Binary Outcomes John Kwagyan, Nana Yaw Osafo, William Southerland Howard University College of Medicine Application of appropriate statistical strategies allows researchers to realize the full potential in analyzing data from paired ocular measurements, including those in prospective studies, and GWAS. The nature of ocular measurements of paired eyes, however, pose challenges to researchers in developing predictive models for joint inference on paired-eye data. First, there is the correlation between the pairs of eyes. Then there is the situation where data is available on one eye for some people and on both eyes for others. Moreover, the classification of a case definition, uniquely, varies from disease to disease; whiles some require the presence of a clinical phenotype in one eye for categorization as a case, others require presence of the phenotype in both eyes.    Classification of advanced Age-related Macular Degeneration, for example, require presence of geographic atrophy and/or neovascular in at least one eye, whiles classification of severe glaucoma requires the presence of optic nerve damage with loss of vision in both eyes.   Furthermore, the advent of GWAS has permitted the evaluation of different variants that contribute to the etiology of eye disorders- posing the dilemma of testing multiple hypotheses. These different facets of data structures pose challenges that require advanced methodological approaches for prediction, inference, and interpretation. We develop a risk prediction likelihood model, that is computationally tractable, for making inference for paired, unpaired and combined paired-unpaired data structures that commonly arise in ocular research, correcting for the correlation between eyes and allowing for inclusion of risk factors.  Our initial development will focus on dichotomous outcomes with an extension to the simultaneous assessment of bivariate outcomes. We illustrate the practicability of our models utilizing the comprehensive datasets from the Age-Related Eye Disease Study, NIH/NEI sponsored well-designed prospective clinical trial with rich phenotype, clinical information, and genetic data.   Work is supported by NIH-NIMHD Grant# U54MD007597.

https://f1000research.com/posters/12-1502
35 203 Andrew Latham Workshop: Tools for assembling the cell: Towards the era of cell structural bioinformatics Poster only Latham Integrative spatiotemporal modeling of biomolecular processes Andrew P. Latham, Jeremy O. B. Tempkin, Shotaro Otsuka, Wanlu Zhang, Tugce Yenice, Jan Ellenberg, and Andrej Sali University of California- San Francisco, University of California- San Francisco, European Molecular Biology Laboratory- Heidelberg; Germany, European Molecular Biology Laboratory- Heidelberg; Germany, European Molecular Biology Laboratory- Heidelberg; Germany, European Molecular Biology Laboratory- Heidelberg; Germany, University of California- San Francisco Dynamic processes involving biomolecules are essential to the function of the cell. We have developed an integrative method for building spatiotemporal models of biomolecular systems based on multiple heterogeneous sources of information, including time-resolved experimental data and physical models of macromolecular dynamics. Our method models time-dependent processes by first computing ensembles of integrative structural models at fixed time-points and then connecting those snapshots into an integrative spatiotemporal model of the process of interest. We have demonstrated the practical utility of our method by application to the assembly of the human Nuclear Pore Complex in the context of the reforming nuclear envelope after mitotic cell division, based on live-cell correlated electron tomography, bulk fluorescence microscopy, and a structural model of the fully-assembled NPC. Our integrative spatiotemporal model of NPC assembly improves the precision, accuracy and completeness of the resulting model over alternative methods that can be derived using conventional integrative modeling approaches alone. Our method is applicable to a wide range of time-dependent systems in structural biology, and is available to the broader scientific community through an implementation in the open source Integrative Modeling Platform (IMP). https://ucsfonline-my.sharepoint.com/:b:/g/personal/andrew_latham_ucsf_edu/EcOF1v8Bd7tOm7YPGSvjWHQBYNOWyieRIaMjizApyIW8pw?e=l1XgZ8
36 221 Trang Le Workshop: Tools for assembling the cell: Towards the era of cell structural bioinformatics Poster only Le Proteome profile across cell shape continuum Trang Le, Matheus Viana, William Leineweber, Wei Ouyang, Susanne Raefelski, Emma Lundberg Department of Bioengineering Stanford University (USA),

Allen Institute for Cell Science (Seattle WA USA),

Department of Bioengineering Stanford University (USA),

KTH Royal Institute of Technology (Stockholm Sweden),

Allen Institute for Cell Science (Seattle WA USA),

Department of Bioengineering and Department of Pathology Stanford University (USA) and KTH Royal Institute of Technology (Stockholm Sweden)
Cells come in various shapes underpinning homeostasis that are crucial for carrying out essential functions to maintain the body's internal balance. The diversity of cellular and tissue structures can arise from a few basic cell shapes, which undergo various transformations based on biophysical constraints on cytoskeletal organization. These different cell shapes may also indicate distinct signaling or transcriptional states. Cellular geometry has been suggested to convey spatial information to guide processes, such as polarity, signaling, morphogenesis and division-plane positioning (Haupt & Minc, J Cell Sci 2018). In a 3D iPS culture, (Viana et al, Science 2023) measured and compared the shapes of cells and the locations of internal structures marked by a single representative protein. However, the orchestration of the whole proteome in regulating cell morphologies, hence functions, are still poorly appreciated. In this study, using up to 10,000 subcellular proteins spanning across 30 organelles of 10 cell lines in the Human Protein Atlas database, we aim to decipher the natural range of cell shape variation and its corresponding proteome’s spatial variation.  
37 232 Ernst H Pulido Workshop: Tools for assembling the cell: Towards the era of cell structural bioinformatics Poster only Pulido Integrative Analysis of Protein-Protein Interactions Using Cross-Linking Mass Spectrometry and Structural Modeling Ernst H. Pulido, Anthony J. Cesnik, Trang Le, Frederic Ballllosera, Emma Lundberg Biophysics Program - Department of Biology - Stanford University (EHP, EL), Bioengineering Department - Stanford University (AJC, TL, FB, EL), Department of Pathology - Stanford University (EL), Chan-Zuckerberg Biohub - San Francisco (EL) Protein-protein interactions (PPI) underpin most biological systems and have an impact on health, disease, and biotechnological advancements. Traditional methods like affinity purification and immunoprecipitation mass spectrometry (AP-MS and IP-MS) often overlook aspects of PPIs, such as transient or weak interactions. Furthermore, computational methods fall short in accurately modeling consistent complex structures. Cross-linking mass spectrometry (XL-MS) has emerged as an essential proteomic tool, pinpointing interactions by linking protein residues within specific distances. Here, we present a comprehensive method of combining XL-MS and integrative modeling that can deepen our understanding of protein complexes and inform advancements in health and disease research.



Our study uses XL-MS data from independent labs and spatial imaging from the Human Protein Atlas (HPA) to locate cellular protein interactions. Resources like the Atlas of Human Cell Morphology and cBioPortal supplement this study by helping to identify disease-related protein interactions. To find PPI structures, we employ bioinformatics algorithms such as AlphaFold2 and HADDOCK 2.4 that allow us to create structural predictions of protein complexes that correspond accurately with the physical distances observed from XL-MS. We present a comprehensive analysis of protein interactions across various biological systems, including validation against protein and DNA variance profiles, hydropathy indexes, and other biophysical parameters. Our findings shed light on potential structures for protein complexes such as COA3-COX5B, HMGB2-HMBG3, and LDHA-LDHB.



Our approach combines cross-link data with computational predictions and diverse datasets to effectively model complex protein structures. We aim to map cellular PPIs to provide a clearer picture of their collective functions, and employ molecular dynamics simulations for detailed insights into molecular interactions. This holistic method improves bioinformatic predictions and provides a comprehensive view of PPI networks based on their physicochemical properties.

 
38 200 Khandakar Tanvir Ahmed General Poster only Khandakar Tanvir Ahmed DTI-LM: Language Model Powered Drug-Target Interaction Prediction​ Khandakar Tanvir Ahmed, Wei Zhang Department of Computer Science, University of Central Florida The exploration of drug-target interaction (DTI) not only expedites the identification of novel drug candidates but also augments our capacity to repurpose existing compounds for diverse therapeutic applications. DTI prediction is an indispensable tool to aid the initial stages of drug discovery by expediting the identification of potential drug-target interactions, thereby streamlining the process of lead compound selection and, consequently, experimental validation. Numerous studies have demonstrated the utility of computational approaches, including machine learning algorithms, network-based methods, and molecular docking simulations for DTI prediction. In recent times, the advancement of DTI prediction has been notably accelerated, primarily attributed to the extensive accumulation and accessibility of biomedical datasets and progress of deep learning techniques. Several advanced deep learning-based frameworks for DTI prediction have emerged, utilizing diverse sets of data as input. These frameworks can be broadly categorized into knowledge graph-based methods, 3D structure-based approaches, 2D pairwise distance map-based techniques, and 1D sequence-based methods. Heterogeneous knowledge graph (KG)-based methods have demonstrated success in various scenarios of DTI prediction, including warm start, cold start for drugs, and cold start for proteins. Cold start predictions involving unknown drugs or proteins are particularly challenging as limited or no information about that drug or protein is available during model training. On the other hand, 1D sequence-based are computationally more efficient than KG but suffers from cold start predictions. In this study, we introduce DTI-LM, designed for predicting DTI by leveraging language models to generate encodings from 1D protein and drug sequences. We enhance the encoding process by introducing graph attention networks (GAT) for more nuanced and context-aware DTI predictions. Comprehensive set of experiments shows improved DTI prediction using our proposed model.  https://drive.google.com/file/d/184ErdZQl1QaWQNhbISE-CJS_kFFQXKvM/view?usp=sharing
39 230 Zarif Azher General Poster only Azher Preliminary Multimodal Deep Learning Investigation of Tumor Immune Microenvironment Cell-Type Deconvolution for Colorectal Cancer Prognostication  Zarif L Azher, Salban Nithilaselvan, Tristan DeVictor, Eric Zhang, Tanay Panja, Ji-Qing Chen, Brock Christensen, Lucas Salas, Louis Vaickus, Joshua Levy Thomas Jefferson High School for Science and Technology, Thomas Jefferson High School for Science and Technology, Thomas Jefferson High School for Science and Technology, Thomas Jefferson High School for Science and Technology, Thomas Jefferson High School for Science and Technology, Dartmouth College, Dartmouth College, Dartmouth College, Dartmouth Health, Cedars Sinai Medical Center Background: Deep learning technologies can learn complex patterns from input patient data (ex; histopathology, DNA methylation) to predict clinically useful cancer-related endpoints (ex; survival). Multimodal methods fuse together information from heterogeneous modalities, to make more informed predictions. The tumor microenvironment (TME) is characterized by a mosaic of immune cell types whose composition – which can be deconvolved from “omics” data – may hold clues regarding a patient’s unique disease profile and progression. Existing deep learning-driven multimodal methods do not incorporate this important data type into machine learning prognostication models. 



Methods: Here, we develop a multimodal deep learning model which integrates patient histopathological imaging, bulk DNA methylation profiling, and deconvolved cellular proportions from 17 lineages using HiTIMED, to predict colorectal cancer patient prognosis (ie: instantaneous risk of death). Data was collected and preprocessed from 414 patients from The Cancer Genome Atlas (TCGA). Unimodal prognostication models were trained to derive numerical representations from each modality. The final multimodal model fused together these representations to learn a joint information space and subsequently predict patient prognosis. Models were assessed using concordance index (c-index) measured on a heldout testing dataset partition.



Results and Conclusions: Unimodal models using histopathology (0.593 ± 0.003), DNA methylation (0.615 ± 0.001), and immune cell proportions (0.623 ± 0.003) were similarly powerful independent predictors of prognosis measured via C-Index. The multimodal model (0.672 ± 0.002) brought an average performance increase of 4.4%. This performance was comparable and potentially superior versus other reported multimodal colorectal cancer prognostication methods (0.640 ± 0.007, 0.650 ± 0.005). Our results demonstrate that TME-related cell profiles should be considered a capable modality for independent and joint investigation of actionable cancer outcome prediction. Future works will apply this modality to additional predictive tasks, and conduct ablation studies to understand how TME data can uncover novel biological insights.                  

https://f1000research.com/posters/12-1559
40 194 Haimeng Bai General Poster only Bai Feature Ranking using Machine Learning for OPNA ASO Design Haimeng Bai Ph.D., Sandra P. Smieszek Ph.D., Bartlomiej Przychodzen Ph.D., Christos M Polymeropoulos M.D., Gunther Birznieks M.S., Mihael H Polymeropoulos M.D.



Vanda Pharmaceuticals Inc., Washington, DC, Vanda Pharmaceuticals Inc., Washington, DC, Vanda Pharmaceuticals Inc., Washington, DC, Vanda Pharmaceuticals Inc., Washington, DC, Vanda Pharmaceuticals Inc., Washington, DC, Vanda Pharmaceuticals Inc., Washington, DC Antisense oligonucleotide (ASO) offers a powerful tool to manipulate gene expression, and researchers are exploring nucleobases modifications to enhance its efficacy. However, the design of these modifications, including their positions and quantities, still heavily relies on empirical insights. A deeper understanding of the factors influencing ASO efficiency is essential. Machine learning (ML) techniques, widely applied in bioinformatics, have shown promise in the design of oligonucleotides. In this study, we leveraged ML methods to rank features influencing ASO knockdown efficiency, aiming to better understand their impact and improve ASO design.

Training set ASOs were designed based on domain knowledge. Gene expressions were accessed by qPCR from ASO-treated cell lines. We constructed a matrix of sequence-based features for modeling. Given our limited sample size, regularized linear regression models were employed, supplemented by bootstrapping to improve estimation robustness.

In the proof-of-concept STMN2 knockdown study (n=53), Elastic Net regression on 1000 bootstrapping samples yielded a mean MSE of 0.104 (0.101, 0.107), and a mean R2 of 0.893 (0.889, 0.897). Predictions using top-ranked coefficients resulted in a Pearson’s correlation coefficient of 0.726. Top-ranked features indicated that a 5’ A modification and overall T count increases fold change, while a core region G modification decreases it. Extending the study to the ongoing JAK2 knockdown project (n=12), we employed Lasso regression, revealing a mean MSE of 0.002 (1.86e-21, 0.016), and a mean R2 of 0.948 (0.64, 1). Predictions were not computed due to small sample size. Top-ranked features suggested that a 5’ G modification influences the fold change the most.

ASO-mediated knockdown primarily involves disrupting pre-mRNA splicing, a complex process with multiple contributing factors, including the blocked sequence, binding position, and the type and/or number of nucleobase modifications. Our study highlights the utility of ML in ranking influential features, and reports top-impacting features for future experimental validation.

 
41 235 Gurkan Bebek General Poster only Bebek Unraveling Proteomic and Pathological Heterogeneity in Alzheimer's Disease and Dementia with Lewy Bodies Gurkan Bebek Case Western Reserve University Dementia is a global challenge, impacting approximately 50 million individuals. Alzheimer's disease (AD), the leading cause, is marked by abnormal accumulations of amyloid beta (Aβ) and tau proteins. Dementia with Lewy bodies (DLB), involving α-synuclein protein (α-syn) inclusions, often coexists with AD pathology, revealing a commonality of mixed pathology in many cases.



The complex biological landscape of AD necessitates a thorough understanding for the development of innovative treatments. Recent cerebrospinal fluid (CSF) proteomic studies identified over a thousand proteins with distinct expression levels, revealing AD subtypes associated with survival, hyperplasticity, blood-brain barrier dysfunction, and innate immune activation. Genetic analyses underscored heightened AD risk across these subtypes, with non-demented AD individuals displaying an increased risk of clinical decline.



Building on these findings, more recent CSF proteomics in AD and DLB identified additional AD subtypes, shedding light on molecular constituents contributing to Lewy body (LB) formation. By integrating LB proteomics with previous signatures, we aim to correlate identified proteins, including kinases and ubiquitin-related enzymes, with the pathophysiology of related neurodegenerative disorders. This exploration provides insights into the regulation of α-synuclein and LB proteins, potentially contributing to the formation of Lewy bodies and α-synuclein toxicity.



Furthermore, we refine the characterization of AD subtypes using CSF proteomics, incorporating genetic analyses to understand their correlation with heightened AD risk and clinical outcomes. This integrated approach seeks to unravel the intricate molecular and pathological heterogeneity within AD and DLB, offering a comprehensive perspective for advancing targeted therapeutic interventions.
https://systems.bio/papers/2023/12/04/PSB2024
42 227 Steven Brenner General Poster only Chandonia The Demise of Structural Genomics has Impaired the Ability of Deep Learning Algorithms to Model the Human Proteome, Confounding Interpretation of Genetic Variants John-Marc Chandonia, Steven E. Brenner John-Marc Chandonia: Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley CA 94720, USA.

Steven E. Brenner: Center for Computational Biology and Department of Plant and Microbial Biology, University of California, Berkeley CA 94720, USA.

1,200 new protein structures are solved per month.  However, <2% represent the first solved structure from a Pfam family; 98% are from families with another known structure and typically share their fold.  By contrast, the first structure from a family reveals its fold and enables inference of ancient relationships to other proteins.  Structurally characterized families were needed to train deep learning algorithms that have made breakthroughs in accurate structure prediction.  Structural Genomics was directed at solving such structures.  Between 2003-2007, Structural Genomics centers’ output peaked at 25 structurally characterized families per month, and traditional laboratories solved a similar number, with these ~50 representing ~10% of all structures solved at that time.  Most structure characterization since has focused on detailed understanding.  There has been a decrease not just in the fraction of new families being characterized, but in the absolute number, now 15/month.



The deteriorating rate of new family structural characterization impacts interpretation of human genetic variants.  Among the 77% of the human proteome amenable to structural characterization (excluding low complexity, disordered, and coiled coil regions), 61% of residues are in regions homologous to a known structure.  Had Structural Genomics continued with productivity comparable to 2007, >90% would likely be in regions homologous to a structure.  At current rates, achieving 90% will take two decades.



Deep learning methods have also improved structural knowledge of the human proteome:  AlphaFold models 77% of residues amenable to structural characterization at “very high confidence” (pLLDT >90).  85% of residues in structurally characterized Pfam families are modeled with “very high confidence.” This fraction drops to 32% elsewhere, reflecting AlphaFold’s substantial dependence on known homologous structures.  Therefore, the value in structurally characterizing new families is even higher today than 20 years ago, as structures can be leveraged by deep learning to model homologous proteins with high accuracy.

 
43 222 Anthony Cesnik General Poster only Cesnik Deciphering hierarchical cell cycle controls with phosphoproteomics Anthony J. Cesnik, Christian Gnann, Frank McCarthy, Dan N. Itzhak, Emma Lundberg Bioengineering Department - Stanford University (AJC; EL), Chan-Zuckerberg Biohub - San Francisco (AJC; CG; FM; DNI; EL), SciLifeLab and KTH Royal Institute of Technology - Stockholm Sweden (AJC; CG; EL)

The cycle of cellular division is a tightly coordinated process that involves the cyclical expression of thousands of genes, proteins, and post-translational modifications (PTMs). Recently, we identified hundreds of proteins with newly identified associations to the cell cycle by interrogating the expression of these proteins on the single-cell level. We showed that a large majority of these novel proteins were not regulated at the level of transcript expression, but instead found indications that they are regulated post-translationally, such as by phosphorylation. In this study, we perform a deep phosphoproteomic study of the interphase between cellular divisions that reveals over 2,000 proteins have cell cycle correlated phosphosites. These phosphoproteins have high functional significance and form an extended network of cell cycle regulated phosphoproteins across many layers of cellular organization.  
44 189 Juhee Choi General Poster only Park Optimizing Journey Arrangements Using Elastic Search and Live Consumer Insights Sooah Park, Juhee Choi Sangmyung Univ, Republic of Korea Our previous research introduced the  data-driven enhanced travel planning by utilizing real-time user data. On top of that, We provide benefits of Jeju Island with visitors for easy access to information about local restaurants, accommodations, and tourist attractions based on their position. This is the first page of the website. Users can select categories such as Work, Lodging, Traffic, and Tour from the menu bar. By selecting a category, users can access the Kibana dashboard. (it is necessary for users to log in as a Guest mode.) After logging in, they can view relevant data information.

This helps travel planning for travelers to locate dining options, nearby accommodations, and popular attractions with less effort. Especially, it's useful for foreign tourists in Korea for working holidays or longer stays. They can easily understand the local lifestyle and find proper places to stay and eat. Long-term tourists and job seekers  in different regions will also find the valuable service. Tourists explore the local culture, while job seekers are to learn about the area's job opportunities.
 
45 196 Jungmin Choi General Poster only Choi Identification of Potential Novel Genetic Variants in Pulmonary Arterial Hypertension through Whole Exome Sequencing in a Korean Population Moonyoung Lee

Jungmin Choi
Department of Biomedical Sciences, Korea University College of Medicine, Seoul, South Korea

Department of Biomedical Sciences, Korea University College of Medicine, Seoul, South Korea
Pulmonary arterial hypertension (PAH), a rare condition affecting 15 to 50 individuals per million, is characterized by narrowing, thickening, or hardening of the lung arteries. This pathological change impedes blood flow in the lungs and heightens pressure within the pulmonary arteries. Though research advancements have led to new drug treatments, such as one targeting the BMPR2 gene, this therapeutics neither offer a definitive cure nor have they been tailored for non-Caucasian populations. Known genetic variants like BMPR2 and EIF2AK4 fail to explain the cause in many patients, prompting us to search for novel pathogenic variants contributing to PAH.

Through gene burden analysis, our study analyzed whole exome sequencing (WES) data from 89 Korean PAH patients. To focus on discovering new variants, we excluded 13 samples with known pathogenic variants and another 10 samples from 18 related pairs to ensure accurate gene burden computation. Our subsequent downstream analysis utilized 66 samples.

We identified NR1H2, KANK1, and COL8A1 as significant genes by employing two distinct approaches, a control-free methodology, and a case-control comparison. These genes demonstrated significance in the control-free test and presented the lowest p-values in the case-control comparison. The variants detected within these genes demonstrate high expression in the arteries, endothelium, and lungs, indicating potential novel causal genes for PAH.
 
46 185 Caleb Class General Poster only Broyles Unraveling the mysteries of gene transcription activation with bioinformatics Bradley K. Broyles, Tamara Y. Erkina, Theodore P. Maris, Andrew T. Gutierrez, Daniel A. Coil, Thomas M. Wagner, Xiao Wang, Daisuke Kihara, Caleb A. Class, Alexandre M. Erkine Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences, Purdue University Dept. of Computer Science, Purdue University Dept. of Computer Science, Butler University Dept. of Pharmaceutical Sciences, Butler University Dept. of Pharmaceutical Sciences The mechanism of gene expression initiation by transcription activation domains (TADs) remains a mystery: TADs are short and intrinsically disordered, lacking a specific sequence or structure and exhibiting “fuzzy” interactions with a variety of targets. We have identified features corresponding to TADs using bioinformatics and machine learning methods. First, two published data sets containing thousands of TADs selected from random sequences were re-analyzed and compared. Greater than 1% of random sequences proved to be functional as TADs, and regression analysis demonstrated that greater abundance of acidic and aromatic amino acids conferred functionality, while basic residues were detrimental. The long short-term memory (LSTM) neural network was used to understand functional sequences in greater detail, improving prediction accuracy (AUC 0.98) and identifying rules for functionality, such as the need for both aromatic and acidic residues in a functional TAD, as well as location preferences for certain types of residues. We conducted additional experiments to confirm and expand on these rules: surprisingly, TADs with only one aromatic and one acidic residue proved to be functional. Additionally, these experiments demonstrated that breaking an amphipathic alpha-helix with a proline increases the likelihood of TAD functionality. These findings strongly contradict the traditional recruitment model, emphasizing the need for additional work, thought, and open-mindedness in the area of gene transcription. https://f1000research.com/posters/12-1378
47 207 Amy Francis General Poster only Francis DrivR-Base: A Feature Extraction Toolkit For Variant Effect Prediction Model Construction Amy Francis, Colin Campbell, Tom R. Gaunt MRC Integrative Epidemiology Unit, Bristol Medical School (PHS), University of Bristol, Oakfield House, Oakfield Road, BS8 2BN, Bristol, United Kingdom,



Intelligent Systems Laboratory, University of Bristol, 1 Cathedral Square, BS1 5DD, Bristol, United Kingdom,



MRC Integrative Epidemiology Unit, Bristol Medical School (PHS), University of Bristol, Oakfield House, Oakfield Road, BS8 2BN, Bristol, United Kingdom,
Motivation: Recent advancements in sequencing technologies have led to the discovery of numerous variants in the human genome. However, understanding their precise roles in diseases remains challenging due to their complex functional mechanisms. Various methodologies have emerged to predict the pathogenic significance of these genetic variants. Typically, these methods employ an integrative approach, leveraging diverse data sources that provide critical insights into genomic function. Despite the abundance of publicly available data sources and databases, the process of navigating, extracting, and pre-processing features for machine learning models can be daunting. Furthermore, researchers often invest substantial effort in feature extraction, only to later discover that these features lack informativeness.

Results: In this paper, we present DrivR-Base, an innovative resource that efficiently extracts and integrates molecular information (features) for single nucleotide variants from a wide range of databases and tools, including AlphaFold, ENCODE, and Variant Effect Predictor. The resulting features can be used as input for machine learning models designed to predict the pathogenic impact of human genome variants in disease. Moreover, these feature sets have applications beyond this, including haploinsufficiency prediction and the development of drug repurposing tools. We describe the resource's development, practical applications, and potential for future expansion and enhancement.

Availability and Implementation: DrivR-Base source code is available at https://github.com/amyfrancis97/DrivR-Base.



https://f1000research.com/posters/12-1521
48 225 Ishan Gaur General Poster only Gaur Single-cell Heterogeneity and Changes in the Multi-scale Architecture of the Cell across the Cell Cycl Ishan Gaur, Trang Le, Emma Lungberg Stanford EE, Stanford BioE, Stanford BioE It is well known that a large fraction of the human proteome displays significant cell-to-cell variability, but how this impacts functional assemblies across biological scales is far less understood. In this work, we explore cell-cycle-dependent and single-cell heterogeneity in the U2-OS cell line, examining how changes in the expression and localization of proteins can be mapped onto the full spatial architecture of the cell. We do this by building MuSIC hierarchies for cells in G1, G1-S transition, and G2. MuSIC maps integrate immunofluorescence imaging and affinity-purification mass-spectrometry data to produce a structural hierarchy of components that form the cell. Using a pseudotime model trained to predict FUCCI marker dynamics from DAPI and 𝛄-tubulin reference channels we are able to separate our imaging data by cell-cycle phase to make phase-specific maps. By examining these proteome-scale variations, we identify an essential conserved substructure across the cell cycle, as well as cell-cyle-dependent components that form, dissolve, and translocate between phases. We show that these changes occur at all scales, from assemblies of just a few proteins, all the way to large substructures of organelles. We also present some preliminary results and a discussion of how much of all single-cell heterogeneity we can expect to be explained by cell-cycle-driven remodeling. Based on this work, we propose a general framework that can be applied to understand dynamic changes in the cell’s structural organization across a variety of conditions such as disease or drug perturbations, environmental changes, or heterogeneity between patients.  
49 183 Olivier Gevaert General Poster only Gevaert Digital profiling of cancer transcriptomes from histology images with grouped vision attention Yuanning Zheng, Marija Pizurica, Francisco Carrillo-Perez, Humaira Noor, Wei Yao, Christian Wohlfart, Kathleen Marchal, Antoaneta Vladimirova, Olivier Gevaert

Department of Medicine, Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, Stanford, 94305, USA.

Internet technology and Data science Lab (IDLab), Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium.

Roche Molecular Systems, Inc., Santa Clara, CA.

Roche Diagnostics GmbH, Penzberg, Germany.

Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA.



Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. RNA-sequencing has emerged as a potent tool to unravel the transcriptional heterogeneity. However, large-scale characterization of cancer transcriptomes is hindered by the limitations of costs and tissue accessibility. Here, we develop SEQUOIA, a deep learning model employing a transformer architecture to predict cancer transcriptomes from whole-slide histology images. We pre-train the model using data from 2,242 normal tissues, and the model is fine-tuned and evaluated in 4,218 tumor samples across nine cancer types. The results are further validated across two independent cohorts comprising 1,305 tumors. The highest performance was observed in cancers from breast, kidney and lung, where  SEQUOIA accurately predicted 13,798, 10,922 and 9,735 genes, respectively. The well predicted genes are associated with the regulation of inflammatory response, cell cycles and hypoxia-related metabolic pathways. Leveraging the well predicted genes, we develop a digital signature to predict the risk of recurrence in breast cancer. While the model is trained at the tissue-level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies. https://docs.google.com/presentation/d/1tbwHtp7UOVyT1D143d7hq8ssTKCH7Ocu/edit?usp=share_link&ouid=118073055971768511611&rtpof=true&sd=true
50 208 Sonali Gupta General Poster only Gupta Genetic and social risk factors for type 2 diabetes health disparities: a test of the Rose hypothesis Sonali Gupta, I. King Jordan, Leonardo Mariño Ramirez Sonali Gupta - National Institute on Minority Health and Health Disparities (NIMHD),  Applied Bioinformatics Laboratory (ASRT Inc.)

I. King Jordan - Georgia Institute of Technology

Leonardo Marino Ramirez - National Institute on Minority Health and Health Disparities (NIMHD)





Background: The Rose hypothesis predicts that since genetic variation is much greater within than between populations, genetic risk factors should be associated with individual cases but not population disparities, and since environmental exposure variation is much greater between than within populations, environmental risk factors should be associated with population disparities but not individual cases.



Methods: We used a cross-sectional study of the UK Biobank to test the Rose hypothesis for type 2 diabetes (T2D) ethnic disparities in the United Kingdom (UK). Our cohort consists of 26,912 participants, enrolled 2006-2010, from Asian, Black, and White ethnic groups. We modeled T2D genetic risk using a polygenic risk score (PRS) and socioeconomic deprivation using the Townsend Index (TI), with age and sex as covariates. Within and between ethnic group variance components were estimated for genetic risk and socioeconomic deprivation.



Results: T2D prevalence differs for Asian 23.3% (OR= 5.02, CI= 4.57-5.52), Black 16.6% (OR=3.93, CI=3.54- 4.37), and White 7.1% (reference) ethnic groups in the UK. Both genetic and socio-environmental T2D risk factors show greater within (w) than between (b) ethnic group variation: PRS w=64.4%, b=35.6%; TI w=71.7%, b=28.3%. Nevertheless, genetic risk (PRS OR=1.96, CI=1.87-2.07) and socioeconomic deprivation (TI OR=1.09, CI=1.08-1.10) are associated with T2D individual risk and mediate T2D ethnic disparities (Asian PRS=22.5%, TI=9.8%; Black PRS=32%.0, TI=25.3%).



Conclusions: A relative excess of within versus between group variation does not preclude risk factors from contributing to group differences. Our results support an integrative approach to health disparities research that includes both genetic and socio-environmental risk factors.
https://f1000research.com/posters/12-1552
51 216 Clara (Mengzhou) Hu General Poster only Hu Evaluation of Large Language Models for Discovery of Gene Set Function Mengzhou Hu1, Sahar Alkhairy2, Ingoo Lee1, Rudolf T. Pillich1, Robin Bachelder1, Trey Ideker1,2, and Dexter Pratt1 1. Department of Medicine, University of California San Diego, La Jolla, California, USA,

2. Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA

Abstract

Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI’s GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in ‘omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

https://f1000research.com/posters/12-1555
52 217 Matthew Inkman General Poster only Matthew Inkman Deep Learning Radiogenomic Analysis of Cervical Cancer Recurrence Risk from Pre-treatment FDG-PET Imaging and RNA-seq Data Matthew Inkman, Micah Benson, Alexander Eaton, Heming Zhang, Reethika Veluri,  Michael R. Waters, S. Joshua Swamidass, Thomas Mazur, Abhinav K. Jha, Julie K. Schwarz, Jin Zhang  Department of Radiation Oncology-Washington University School of Medicine, Washington University in St. Louis, Washington University in St. Louis, Washington University in St. Louis, Washington University in St. Louis, Department of Radiation Oncology-Washington University School of Medicine, Washington University in St. Louis, Department of Radiation Oncology-Washington University School of Medicine, Washington University in St. Louis, Department of Radiation Oncology-Washington University School of Medicine, Department of Radiation Oncology-Washington University School of Medicine As many as 30-50% of patients with locally advanced cervical cancer (LACC) experience recurrence after standard-of-care chemoradiation therapy (CRT), creating a critical need to identify pre-treatment biomarkers of treatment failure.  The ability to identify such biomarkers from non-invasive medical imaging in the form of pre-treatment FDG-PET would be of particular use for guiding treatment planning in clinical settings.  In this study, we assess the ability of a variety of techniques to predict time-to-recurrence among a Washington University clinical cohort of 90 patients who received pre-treatment FDG-PET and uniform, curative-intent CRT, beginning with Cox regression analysis of derived radiomics features from patient images, proceeding to construction of machine learning (ML) models (random forest, support vector classifier, logistic regression) trained on sets of information-dense radiomics features, and finally applying deep learning radiomics features.  The performance of all models was assessed via 5-fold cross-validation with ROC AUC averaged across all folds.  The PET prognostic signature constructed from Cox regression had mean F1 score=0.60, elastic net regression, the top performing ML model, had mean F1 score=0.86, and the deep learning predictor had mean F1 score=0.90.  The top 5 input features contributing to DNN output were determined by the Captum integrated gradients attribution algorithm, and for the subset of this cohort with RNA-seq data available (n=46), we computed genes with significant correlations with one or more of these top radiomic features.  We then conducted pathway enrichment analysis for this set of gene, determining the biological programs associated with the predictive radiomic features. https://wustl.box.com/s/x0ypgow42593m131aev8jdbv06p0748e
53 193 Jin-Woo Kim General Poster only Kim CLIP-IVP: CLIP-based Intraoral View Prediction Using Conditional GANs 𝖩𝗂𝗇-W𝗈𝗈 𝖪𝗂𝗆, 𝖸𝖾𝗐𝗈𝗇 𝖫𝗂𝗆, 𝖧𝖾𝖾𝗃𝗂𝗇 𝖸𝗎, 𝖩𝗈𝗇𝗀𝖾𝗎𝗇 𝖢𝗁𝗈𝗂 𝖤𝗐𝗁𝖺 𝖶𝗈𝗆𝖺𝗇𝗌 𝖴𝗇𝗂𝗏𝖾𝗋𝗌𝗂𝗍𝗒 (𝖲𝖾𝗈𝗎𝗅, 𝖱𝖾𝗉𝗎𝖻𝗅𝗂𝖼 𝗈𝖿 𝖪𝗈𝗋𝖾𝖺),

𝖸𝗈𝗇𝗌𝖾𝗂 𝖴𝗇𝗂𝗏𝖾𝗋𝗌𝗂𝗍𝗒 (𝖲𝖾𝗈𝗎𝗅, 𝖱𝖾𝗉𝗎𝖻𝗅𝗂𝖼 𝗈𝖿 𝖪𝗈𝗋𝖾𝖺),

𝖸𝗈𝗇𝗌𝖾𝗂 𝖴𝗇𝗂𝗏𝖾𝗋𝗌𝗂𝗍𝗒 (𝖲𝖾𝗈𝗎𝗅, 𝖱𝖾𝗉𝗎𝖻𝗅𝗂𝖼 𝗈𝖿 𝖪𝗈𝗋𝖾𝖺),

𝖸𝗈𝗇𝗌𝖾𝗂 𝖴𝗇𝗂𝗏𝖾𝗋𝗌𝗂𝗍𝗒 (𝖲𝖾𝗈𝗎𝗅, 𝖱𝖾𝗉𝗎𝖻𝗅𝗂𝖼 𝗈𝖿 𝖪𝗈𝗋𝖾𝖺)
Objectives: While intraoral images play a pivotal role in orthodontics, obtaining a complete set of standardized intraoral images in real-world clinical settings remains challenging. Therefore, we propose a novel method called CLIP-based Intraoral View Prediction (CLIP-IVP) that utilizes a pre-trained CLIP image encoder to predict medically standardized full set of intraoral images from front intraoral images without additional clinical information.

Material and Methods: We frame our objective as image-to-image translation. Training dataset should include tuples (x_i,y_i,c_i), where x_i,y_i, and c_i, respectively represents the front intraoral image, the corresponding view of intraoral structures, and the corresponding class label. By forwarding xi through the CLIP image encoder, we obtain the corresponding CLIP image latent vector κ_i. During training, the decoder is conditioned on κ_i and c_i.

Results: We train our model using an internal dataset consisting of 7,000 pairs of intraoral images, while the test set and the validation set had 2,048 pairs and 512 pairs, respectively. We achieve an FID of 3.4 on image-to-image translation. By manipulating an image latent based on the disparity between pre-treatment and post-treatment image latent, our model can also be used to progressively predict the orthodontic treatment process of a patient using only a pair of source images.

Conclusion: Our model provides a promising solution for intraoral view prediction and has the potential to improve the efficiency and accuracy of orthodontic treatment planning.

https://drive.google.com/file/d/1bm91lComWu4yGGBYBJnuJY56muerOrXU/view?usp=drive_link
54 211 Eunjee Lee General Poster only Ji Exploring resting-state fMRI using functional regression models with application to Alzheimer's disease Ido Ji, Eunjee Lee Chungnam National University, Department of Bio AI Convergence,

Chungnam National University, Department of Information and Statistics
Research incorporating functional data analysis (FDA) methodologies within resting-state fMRI (Rs-fMRI) studies has been understudied. Traditional approaches employing raw BOLD signals for functional principal component analysis (FPCA) have proven inadequate due to the complex nature of fMRI data. In this study, a new approach was introduced by calculating Euclidean distances between brain regions to establish a new form of functional data, which was then subjected to FPCA followed by group penalty logistic regression. This approach outperformed existing logistic regression models that use functional connectivity measures, indicating its potential in selecting relevant brain regions for future Alzheimer’s Disease (AD) research. These findings suggest that the method can significantly contribute to the identification of neural substrates related to AD, promising to aid in the advancement of understanding and potentially intervening in the disease process.  
55 184 Sung Hwan Lee General Poster only Lee Comprehensive analysis using transcriptomic signatures from developmental hierarchy reveals two clinically distinct subtypes of stem cell-like hepatocellular carcinoma. Sung Hwan LEE, Bo Hwa SOHN, Yun Seong JEONG, Ji-Hyun SHIN, Ju-Seog LEE Department of Surgery, CHA Bundang Medical Center, CHA University School of Medicine, Korea,

Department of Systems Biology, University of Texas MD Anderson Cancer Center, USA

Background: Hepatocellular carcinoma (HCC) is lethal malignancy with second highest worldwide cancer mortality. Genomic features of stem cell-like cancer cells contributing aggressive tumor biology and therapeutic resistance in HCC remains unclear. The aim of this study is to develop novel prediction models for stem cell-like hepatocellular carcinoma (scHCC) for clinical HCC cohorts and understand underlying biology associated with HCC stemness by integrating multi-platform data including the genome, epigenome, transcriptome, proteome.

Methods: Fetal liver signatures were extracted by analyzing single-cell transcriptomic data from human fetal liver, 10 week and 17 week in gestation and mature hepatocytes from adult liver. By using Bayesian compound covariate predictor (BCCP) algorithm, HSC signatures were then applied to gene expression data from HCC tumors to stratify tumors according to stemness of HCC with multi-platform analysis.

Results: Robust transcriptomic classifier discriminating genomic features for stem cell (SC) and hepatoblast (HB) from differentiated HCC (Mature hepatocyte, MH) was made by BCCP algorithm. The patients assigned to SC group showed aggressive tumor features including large tumor size, high AFP, vascular invasion, and extrahepatic metastsis as well as worst prognosis with early recurrence. Cancer associated pathways in terms of cell cycle, epithelial mesenchymal transition, and TGF-beta pathway were highly upregulated in SC goup. HB group showed high activity of MYC downstream pathway and metabolic pathways for nucleotide, TCA cycle, and amino acid. Multi-platform analysis revealed that there are distinct molecular patterns among three group derived from hepatic stem cell signatures. Higher loss of function mutations of TP53, RB1 with PTEN deletion were significantly identified in SC group. SPI1 was most impact transcription factor in SC group and MYC in HB group, and HNF4A in MH group.

Conclusions: Stemness in HCC is not only associated with clinical outcomes but also with correlated with multiple genomic and proteomic traits in tumors.

 
56 231 Joshua Levy General Poster only Srivastava Biomedical National Elemental Imaging Resource Co-Registration Tool Facilitates Metals-Based Pathway Analysis Aruesha Srivastava, Neha Shaik, Yunrui Lu, Matthew Chan, Alos Diallo, Serin Han, Ramsey Steiner, Tracy Punshon, Brian Jackson, Linda Vahdat, Louis Vaickus, Jack Hoopes, Fred Kolling IV, Jonathan Marotti, Joshua Levy

Grafton High School, Cupertino High School, Dartmouth Health, Dartmouth Health, Dartmouth College, Dartmouth Health, Dartmouth College, Dartmouth College, Dartmouth College, Dartmouth Health, Dartmouth Health, Dartmouth College, Dartmouth College, Dartmouth Health, Cedars Sinai Medical Center Trace elements, both essential and toxic, are pivotal in biological processes, including cancer. Elements like copper, cadmium, and iron play significant roles in mitochondrial metabolism, cell proliferation, and proangiogenic pathways. A key factor in these processes is the competitive binding of metals to metal transporters, a mechanism conserved across species for maintaining homeostasis. Disruptions to homeostasis could reveal new biomarkers and therapeutic targets. Traditional bulk measurements overlooks the significance of specific tissue architectures, potentially obscuring critical associations. Spatially resolved metal analysis through techniques like laser ablation inductively coupled plasma time-of-flight mass spectrometry (LA-ICPTOF-MS) offers detailed maps of multi-elemental distributions. However, correlating findings with tissue architectures necessitates complex pathological examination and co-registration. To address this, we developed a web application within the Biomedical National Elemental Imaging Resource, facilitating co-registration of high-resolution whole slide images (H&E, mIF, IHC) with elemental maps. This tool enables comparisons of metal abundance across different tissue structures and integration with additional spatial data, e.g., spatial transcriptomics. We applied this application in a pilot study on a colorectal tumor (stage pT3), integrating spatial transcriptomics (~18,000 genes) and cell-type proportions inferred from scRNASeq with LA-ICPTOF-MS. We identified associations between copper and immune activation, and iron with mesenchymal phenotypes. Further analysis using MEFISTO, latent factor model, revealed distinct profiles of metals, genes, and cell types associated with different tissue histologies. For example, Factor Five, linked to epithelial/mesenchymal cells, correlated with genes involved in the epithelial to mesenchymal transition (e.g., COL1A1, COL1A2, SPARC) and elements like Se80, Mg24, and K41. Factor Ten was associated with iron, cell proliferation markers, proliferating B cells, and macrophages. This approach offers a comprehensive perspective on the role of metal bioaccumulation in tumor biology and potential pathways for therapeutic intervention. Applying this web application in diverse tissue contexts will expand the validity of our research findings. https://f1000research.com/posters/12-1560
57 198 Siru Liu General Poster only Liu Why Do Users Override Alerts? Utilizing Large Language Model to Summarize Comments and Optimize Clinical Decision Support Siru Liu, Allison B. McCoy, Aileen P. Wright, Scott D. Nelson, Sean S. Huang, Hasan B. Ahmad, Sabrina E. Carro, Jacob Franklin, James Brogan, Adam Wright Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Computer Science, Vanderbilt University, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA Objective: To evaluate the capability of using generative artificial intelligence (AI) in summarizing alert comments and to determine if the AI-generated summary could be used to improve clinical decision support (CDS) alerts.

Methods: We extracted user comments to alerts generated from September 1, 2022 to September 1, 2023 at Vanderbilt University Medical Center. Human summaries were generated by two physicians and AI summaries were generated by GPT-4. We performed a questionnaire survey to five CDS experts to rate human-generated summarizations and AI-generated summarizations on a scale from 1 (strongly disagree) to 5 (strongly agree) for the 4 metrics: clarity, completeness, accuracy, and usefulness.

Results: Five CDS experts participated in the survey. Among the top 8 rated summaries, five were generated by GPT-4.  AI-generated summaries demonstrated high levels of clarity, accuracy, and usefulness, similar to the human-generated summaries. Moreover, AI-generated summaries exhibited significantly higher completeness and usefulness compared to the human-generated summaries (AI: 3.4±1.2, Human: 2.7±1.2, P=0.001).

Conclusion: End user comments provide clinicians’ immediate feedback to CDS alerts and can serve as a direct and valuable data resource for improving CDS delivery. Traditionally, these comments may not be considered in the CDS review process due to their unstructured nature, large volume, and the presence of redundant or irrelevant content. Our study demonstrates that GPT-4 is capable of distilling these comments into summaries characterized by high clarity and accuracy, as well as superior completeness. These AI-generated summaries could provide CDS experts with a novel means of reviewing user comments to rapidly optimize CDS alerts both online and offline.

 
58 226 Taralynn Mack General Poster only Mack Epigenetic and proteomic signatures associate with clonal hematopoiesis expansion rate Taralynn Mack, MA Raddatz, Joshua S. Weinstock, Sidd Jaiswal, Alexander G. Bick Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA, Department of Medicine, University of California, Los Angeles, Los Angeles, CA, USA, Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA, Department of Pathology, Stanford University, Stanford, CA, USA, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA Clonal hematopoiesis of indeterminate potential (CHIP) is a clonal expansion of hematopoietic stem cells that confers increased risk of blood cancer, heart disease, and death. It is presently unknown what causes CHIP clones with identical driver mutations to expand at different rates in humans. Here, we leverage genetically predicted traits to identify factors that determine CHIP clonal expansion rate. We used the passenger-approximated clonal expansion rate (PACER) method to quantify clonal expansion rate for 4,370 individuals with CHIP mutations in the NHLBI TOPMed cohort and calculated polygenic risk scores for DNA methylation aging measures, inflammation-related lab values, disease traits, and circulating protein levels. We tested for associations between these predicted traits and PACER with multivariable linear regression controlling for covariates. CHIP clonal expansion rate was significantly associated with both genetically predicted and measured epigenetic  clocks (p < 0.01). No associations were identified with inflammation-related lab values or diseases and CHIP expansion rate overall. CHIP driver gene specific analyses identified TNF-alpha signaling as contributing to DNMT3A expansion. An unbiased proteome wide search identified predicted circulating levels of myeloid zinc finger 1 and anti-müllerian hormone as associated with an increased CHIP clonal expansion rate; and TIMP metallopeptidase inhibitor 1 and glycine N-methyltransferase as associated with decreased CHIP clonal expansion rate. In summary, we identified specific biological programs and protein expression patterns that associate with CHIP expansion rate.     
59 210 Onur Mutlu General Poster only Firtina Accurate, Fast, and Scalable Real-Time Analysis of Raw Nanopore Signals Can Firtina, Joel Lindegger, Nika Mansouri Ghiasi, Melina Soysal, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Mohammad Sadrosadati, Mohammed Alser, Onur Mutlu ETH Zurich Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. In a collection of works, we demonstrate that new algorithms and tools for such raw nanopore signal analysis can greatly accelerate and enhance genome analysis by avoiding costly and information-reducing basecalling approaches. We describe two such tools RawHash and RawAlign.



RawHash is the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. RawHash can be found at https://doi.org/10.1093/bioinformatics/btad272 and https://github.com/CMU-SAFARI/RawHash.



RawAlign is the first Seed-Filter-Align mapper for raw nanopore signals. RawAlign combines raw signal seeding and filtering approaches from prior work with a customized high-performance dynamic time warping implementation. RawAlign can be found at https://arxiv.org/abs/2310.05037 and https://github.com/CMU-SAFARI/RawAlign.



Our evaluations demonstrate that RawHash and RawAlign can provide high accuracy and high throughput for analyzing large genomes in real-time.



https://people.inf.ethz.ch/omutlu/pub/RawStar_RealTime_Nanopore_Analysis.pdf
60 214 Alyssa Parker General Poster only Parker Monocyte-endothelial cell interactions heightened in clonal hematopoiesis of indeterminate potential Alyssa C. Parker, J. Brett Heimlich, Ayesha Ahmad, Samuel Bailin, John Koethe, Celestine Wanjalla, Alexander G. Bick Vanderbilt University School of Medicine, Vanderbilt University School of Medicine, Vanderbilt University School of Medicine, Vanderbilt University School of Medicine, Vanderbilt University School of Medicine, Vanderbilt University School of Medicine, Vanderbilt University School of Medicine Background: Clonal hematopoiesis of indeterminate potential (CHIP) refers to the expansion of hematopoietic cells carrying somatic mutations. CHIP has been associated with increased risk of several adverse vascular conditions. To date, circulating immune cells from patients with CHIP have been characterized as proinflammatory. However, circulating immune cells often travel into peripheral tissue, and the action of such cells in human peripheral tissues has not yet been studied in the context of CHIP.

Methods: We hypothesized that circulating CD14+ monocytes carrying CHIP mutations interact with blood vessel endothelial cells differently than wildtype CD14+ monocytes. To address this question, we performed single-cell RNA sequencing on peripheral blood mononuclear cells and subcutaneous adipose tissue in 6 CHIP patients and 6 matched controls. We also used publicly available data from the Chan Zuckerberg Initiative (CZI) to predict how CHIP-mutated cells interact with endothelial cells from 10 unique human tissues.

Results: We found that circulating CD14+ monocytes carrying CHIP mutations had marked increases in signaling to endothelial cells. Specifically, mutant monocytes showed enhanced signaling between ligand and receptor pairs related to leukocyte transendothelial migration. Simulations using endothelial cells from the CZI database showed similar patterns in several tissues.

Conclusions: These findings suggest that CHIP may alter how monocytes interact with blood vessel endothelial cells across several human tissues. Alterations in the monocyte-endothelial interaction may explain some of the increased vascular risks associated with CHIP.

https://f1000research.com/posters/12-1542
61 212 Yash Pershad General Poster only Pershad Quantifying individual level mosaic chromosomal alteration fitness: genetic determinants and clinical consequences  Yash Pershad, Taralynn Mack, Paul Scheet, Paul Auer, Siddhartha Jaiswal, Josh Weinstock, Alexander Bick  Vanderbilt University, Vanderbilt University, University of Texas M. D. Anderson Cancer Center, Medical College of Wisconsin, Stanford University, University of Michigan, Vanderbilt University  Clonal hematopoiesis (CH) is characterized by a clonally expanded population of hematopoietic stem cells. CH can be caused by single nucleotide mutations in myeloid cancer driver genes, termed CHIP or larger structural rearrangements called mosaic chromosomal alterations (mCAs). Factors modulating variations in mCA clonal fitness are poorly understood. 



To address this, we extended a recently developed method called passenger-approximated clonal expansion rate (PACER) to quantify mCA fitness from a single blood sample. In this study, we apply PACER to 6,381 individuals from the NHLBI TOPMed cohort with mCAs involving gain, loss, and copy-neutral loss of heterozygosity. 



To validate PACER, we compared our fitness estimates to an alternative method that infers mCA fitness using variant allele frequency probability distributions from the UK Biobank. PACER estimates, aggregated by mCA location and type, exhibit a high correlation (R2 = 0.49) with this orthogonal approach. 



Unlike prior methods, PACER enables quantifying mCA clone expansion rate at the individual level, facilitating the identification of germline risk loci associated with clonal expansion and quantifying phenotypic consequences of mCA expansion rate. 



We leveraged PACER to identify germline mutations that alter mCA fitness. We conducted a genome-wide association study of mCA PACER and identified a single locus (TCL1A) that reached genome-wide significance. Our PACER analyses also nominate NRIP1, a loci previously associated with mCA prevalence at genome-wide significance, as acting through modulating clonal fitness. 



We sought to understand whether individuals with mCAs expanding at a faster rate had adverse clinical outcomes. Within the subset of mCAs that are known to cause lymphoid malignancies, we identified that increased mCA expansion rates were associated with higher lymphocyte counts (p = 0.019). Future work will evaluate whether individuals with increased mCA expansion rate also are at highest risk of progression to overt hematologic malignancy. 
https://f1000research.com/posters/12-1544
62 215 Hannah Poisner General Poster only Poisner Genetic determinants and phenotypic consequences of blood T-cell abundance in 207,000 diverse individuals  Hannah Poisner, Annika Faucon, Nancy Cox, Alexander Bick Vanderbilt University School of Medicine,Vanderbilt University School of Medicine,Vanderbilt University School of Medicine,Vanderbilt University School of Medicine T-cells play a critical role in multiple aspects of human health and disease. However, to date the genetic determinants of human T-cell abundance have not been studied at scale because assays quantifying T-cell abundance are not widely used in clinical or research settings. The complete blood count clinical assay quantifies lymphocyte abundance which includes T-cells, B-cells, and NK-cells. To address this gap, we directly estimate T-cell fractions from whole genome sequencing data in over 200,000 individuals from the multi-ethnic TOPMed and All of Us studies. We identified 27 loci associated with T-cell fraction. Interrogating electronic health records identified clinical phenotypes associated with T-cell fraction, including notable changes in T-cell abundance that were highly dynamic over the course of pregnancy. In summary, by estimating T-cell abundance, we obtained new insights into the genetic regulation of T-cells and identified disease consequences of T-cell fractions across the human phenome. https://f1000research.com/author/poster/preview/1119677
63 228 Gokul Srinivasan General Poster only Srinivasan Spatial Transcriptomics Inference for the Elucidation of Disease Pathogenesis Across Large Scale Histopathology Cohorts: A Preliminary Analysis in Skin Photoaging Gokul Srinivasan, Matt Davis, Matt LeBoeuf, Josh Levy

Cedars Sinai Medical Center, Dartmouth Health, Dartmouth Health, Cedars Sinai Medical Center Background: Spatial transcriptomics (ST) technologies have revolutionized our understanding of tissue biology, offering spatially resolved insights into diverse tissues and processes, from cancer to embryonic development. Though various ST methods have been developed – each with differing multiplexing and spatial resolution – all have contributed to our burgeoning knowledge regarding human disease and development. ST methods have helped illustrate spatial molecular diversity within tumors and uncovered disease pathways in conditions like diabetic foot ulcers and muscular dystrophy. In development, ST methods have enhanced our grasp of embryology, revealing gene regulation patterns that drive early-life cellular differentiation. However, high costs, the need for skilled technicians, and complex analysis have limited ST's broader application.



Methods: To address these challenges, our research presents a cost-effective ST inference method using standard H&E slides, employing a vision transformer model that shows promising cross-validation results in gene expression prediction across 1000 genes with a median Spearman’s Correlation of 0.600 (CI: 0.580-0.610) across four-fold cross-validation. Inference on unseen slides involves automatic tissue masking, data patching, and model inference, followed by reconstruction of the ST data object. We applied this workflow to 261 skin tissue slides annotated by pathologists.



Results: Our approach successfully recapitulates genetic heterogeneity and histological features, allowing downstream methods to achieve high accuracy in classifying various histomorphologies given the synthetic data. We report a macro-averaged class-specific accuracy of 0.934, 0.986, 0.999, 0.934, 0.666, 0.970, 0.837, 0.900 across the following histological categories: eccrine gland, epidermis, fat, hair follicle, nerve, sebaceous gland, smooth muscle, vessel.



Conclusions: These findings indicate the potential of ST inference by enabling the prediction of histological identity from synthetic gene expression data, identifying relevant genes. Study limitations include homogenous training set and a limited set of gene targets, yet this work offers the potential to democratize access to high resolution ST data.
https://f1000research.com/posters/12-1557
64 218 Gwanggyu Sun General Poster only Sun The E. coli whole-cell modeling project Gwanggyu Sun, Riley Juenemann, Albert Zhang, Cyrus Knudsen, Mica Yang, Sean Cheah, Travis A. Ahn-Horst, Mialy M. DeFelice, Taryn E. Gillies, Cecelia J. Andrews, Markus Krummenacker, Peter D. Karp, Jerry H. Morrison, Markus W. Covert Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, SRI International, SRI International, Stanford University, Stanford University In 1973, Francis Crick first called for a coordinated worldwide scientific effort to determine a “complete solution” of the bacterium Escherichia coli. We have been working for some years now to complete a mathematical model of E. coli that takes into account all of the known functions of every well-annotated gene, in order to better understand and predict the behavior of this scientifically-relevant and industrially-significant model organism. The E. coli whole-cell model is composed of multiple submodels that each simulate a particular biological process within the E. coli cell, and integrates more than 19,000 parameters gathered from decades of research done on this microbe by the scientific community. Here, we highlight our ongoing efforts to improve this model, most recently with new modelling added to better describe growth rate control and transcription unit (operon) structures. We also present the latest applications of the model, where we explore the constraints that shape the structures of the seven rRNA operons of E. coli, and the metabolic burden of integrating new genes into the chromosome.  
65 234 Tate Tunstall General Poster only Tunstall Evaluating the use of deep neural networks to predict polygenic disease in non-European populations  Tate Tunstall, Monika Sun, Darek Ratman, Rober Maier, Premal Shah, Matthew Rabinowitz, Kate Im  MyOme,MyOme,MyOme,MyOme,MyOme,MyOme & Natera, MyOme Polygenic risk scores (PRS) provide an estimate of an individual’s genetic disease risk using many variants across the whole genome. PRS are most often evaluated using linear methods such as linear or logistic regression, however, previous work has suggested that deep neural networks (DNN) outperform linear methods for some traits, including breast cancer, potentially by capturing non-linear interactions between variants. However, the performance of these models have not been thoroughly tested outside of European populations. In this study, we trained and evaluated a range of DNN architectures on several large cohorts (UK Biobank, Women’s Health Initiative, Multi-Ethnic Study of Atherosclerosis) for predicting two phenotypes, breast cancer and coronary artery disease (CAD),  then used transfer learning to fine-tune models to target non-European populations. We found that DNNs perform similarly to simpler methods such as penalized regression in European populations (mean 0.63 AUC breast cancer, 0.65 CAD), but their performance drops significantly in non-European populations (mean Δ AUC -0.06 relative to European model). Fine-tuning to specific populations improved performance (mean Δ AUC 0.03 relative to base model), but underperformed linear methods that explicitly take ancestry into account.  
66 220 Beichen Wang General Poster only Wang Feature Selection for Multivariate Time-Series Data After Tensor Reduction Beichen Wang, Yuk Fai Leung Purdue University, Purdue University Animals display dynamic behaviors upon stimulation, reflecting the underlying neural processing of the environment. The dynamic behavior can be captured by various behavioral assays that generate multivariate time-series datasets known as three-tensor (3T) data. For instance, in a visual-motor response (VMR) assay of zebrafish, multiple variables are recorded to capture zebrafish swimming behaviors in a time series. Variables include swimming distance and durations. Proper analysis of the 3T data can facilitate our understanding of the underlying neural processing, yet current multivariate analysis (MVA) is incompatible with the 3T data because it only handles two-tensor (2T) data structure. To reduce 3T data for MVA, common methods are: 1) constructing one behavioral variable, e.g., combining distance-related variables into one total distance variable in VMR data, to reduce the variable tensor; and 2) concatenating all behavioral variables to reduce the time tensor. However, these methods may generate too few or too many features, reducing the MVA performance to capture the rich repertoire of behavioral dynamics. To solve this problem, we proposed a feature selection workflow after tensor reduction and demonstrated its utility when classifying wildtype (WT) and visually-impaired zebrafish (mutant) using VMR data. The feature selection workflow combines the filter and embedded methods by intersection or union. In the WT-mutant VMR data, we selected features where WT and mutant were different, using p-value by t-test and fold change for the filter method and random forest for the embedded method. We tested six classifiers with 10-fold cross-validation (CV) and found that their average CV accuracies were higher with feature selection than without. The average CV accuracies were higher when concatenating all behavioral variables than when constructing one behavioral variable. In conclusion, we built a feature selection workflow in multivariate time-series behavior data to capture the dynamic behavior and facilitate future research in neural processing.  https://f1000research.com/posters/12-1556
67 201 Kwasi Yeboah Afihene General Poster only Yeboah-Afihene The Association of Obesity on Metabolic Diseases Among District of Columbia (D.C.) Inpatient Population  Kwasi Yeboah-Afihene, John Kwagyan, Gail Nunlee-Bland, Nana Osafo, Edmund Ameyaw, Yayin Fang, William Southerland* Howard University College of Medicine, Howard University Department of Public Health, Howard University College of Medicine, Howard University College of Medicine, Howard University School of Pharmacy, Howard University College of Medicine, Howard University College of Medicine  Obesity globally impacts morbidity and mortality, with numerous comorbidities adversely affecting the chronic disease epidemic. Some of its comorbidities are as well metabolic diseases.  The study examined the association of obesity with metabolic diseases in D.C. inpatient residents.  The aims were 1) to identify the distributions of metabolic diseases by obese status, by race, by payors, and finally by Distressed Community Index (DCI) quantiles, respectively; 2) to determine the association of obesity with metabolic diseases, including cardiovascular disease (CV), diabetes, and hyperlipidemia,  and sociodemographic factors including DCI, and payors. The primary data source was the Healthcare Cost and Utilization Project (HCUP) for D.C. from 2017 to 2019, with inpatients ages 18 to 95, excluding pregnant women. The data was cleaned and prepared for analysis using the Python program. The disease distributions were calculated using cross-tabulations. Separate logistic regression models were performed to examine the impact of obesity in predicting CV, diabetes, and hyperlipidemia, controlling for payors, DCI, and their respective interactions with obesity.   The results suggest that while most of the selected metabolic diseases were associated with obesity, there was not enough evidence to suggest the same for cardiovascular diseases. It was also evident that obesity and diabetes are more prevalent among Hispanic and Black inpatients. In contrast, hyperlipidemia was more prevalent among non-Hispanic Whites. The younger population, ages 18 – 54, were more obese than the older inpatients, ages 55 – 95. In conclusion, obesity is more closely associated with diabetes and hyperlipidemia than observed for cardiovascular disease. Individuals from racial/ethnic groups displayed disparities in metabolic disease prevalence. Insights from our study could inform practitioners of the need to reduce weight in obese people and mitigate metabolic disease complications.  https://f1000research.com/posters/12-1548
68 213 Shuwen Zhang General Poster only Zhang Single-cell multiomic analysis of Alzheimer’s disease reveals disrupted gene regulatory networks in glial subpopulations Shuwen Zhang,

Hongru Hu,

Na Zhao,

Yan W. Asmann,

Yingxue Ren
Department of Quantitative Health Sciences, Mayo Clinic, Rochester MN 55905, USA,

Genome Center, University of California, Davis CA 95616, USA,

Department of Neuroscience, Mayo Clinic, Jacksonville FL 32224, USA,

Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville FL 32224, USA,

Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville FL 32224, USA
Alzheimer’s disease (AD) is the most common form of dementia that primarily affects memory and other cognitive functions. However, the deep molecular and cellular understanding of the pathogenesis of AD remains limited. In recent years, combining single cell sequencing data from multiple omics technologies has enabled a more comprehensive view of the disease pathobiology. In this study, we used paired single cell transcriptomic (scRNA-seq) and chromatin accessibility (scATAC-seq) data (n=26) from the Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) Consortium to elucidate molecular mechanisms of AD at the single cell level. We benchmarked multiple multiomic integration methods that were based on deep learning or factorization analysis, and built an optimized pipeline tailored for comprehensive interrogation of multimodal profiles which enables effective batch correction, cross-donor and -modality alignment to create a unified cell state space for cell identity and state discovery. Through the integrative analysis of chromatin accessibility and gene expression profiles of microglia, astrocytes, and oligodendrocytes, we identified glial cell-specific regulatory programs mediated by various key transcription factors. We also illustrated AD-associated functional changes in glial subpopulations and identified potential risk transcriptomic and epigenomic signatures. Our study provides a comprehensive workflow and a high-resolution portrayal of the complex glial regulatory circuitry perturbations in AD informed by joint analysis of single cell multiomic data. Our findings provide a novel view of disease-state circuit remodeling and offers actionable regulatory programs for functional validation and therapeutic targeting.  
69 237 Xikun Zhang General Poster only Zhang Building a virtual spatial human proteome using Super Multiplexed Cell Xikun Zhang,Wei Ouyang,Ishan Gaur,Emma Lundberg Stanford University,KTH,Stanford University,Stanford University Knowledge of the spatial distribution of proteins at a subcellular level is essential for understanding protein functions, interactions, and cellular mechanisms. Immunofluorescence microscopy has been used to resolve the spatial distribution of human proteins in cultivated cell lines and map them to cellular compartments and substructures with single-cell resolution, generating image-based subcellular human proteome maps like the Human Protein Atlas (HPA) subcellular section. However, the imaging technology has very limited multiplexing capability, only being able to measure the locations and expression levels of one protein for each sample. We have developed the Super Multiplexed Cell, a deep-learning-based generative model that can generate proteomics images of the whole human proteome. We have applied the model to the HPA subcellular section and shown that the model generates realistic, heterogenous and accurate proteomics images. We have also come up with metrics that accurately evaluate the generated images against the groundth truths and match human preferences.   
70 181 Yadi Zhou General Poster only Zhou AlzTarget: A multi-omics database for target identification and prioritization for Alzheimer's disease Yadi Zhou, Yuan Hou, Andrew A. Pieper, Jeffrey Cummings, Feixiong Cheng Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA;

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA;

Harrington Discovery Institute, University Hospitals Cleveland Medical Center, Cleveland, OH, USA;

Chambers-Grundy Center for Transformative Neuroscience, Department of Brain Health, School of Integrated Health Sciences, UNLV, Las Vegas, Nevada, USA;

Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA

Genome-wide and systems biology approaches or resources that utilize the emerging massive multi-omics data to identify likely molecular drivers and drug targets for Alzheimer's disease (AD) are still lacking. We constructed a web service and database named AlzTarget (https://alztarget.lerner.ccf.org/) with interactive visualizations. (1) We performed Mendelian Randomization analysis systematically using all combinations of GWAS outcome datasets (n=12) and quantitative trait locus (QTL) datasets (n=26) to assess the causal relationships. Specifically, we compiled summary statistic datasets for three neurodegenerative diseases: AD (n=7), Parkinson's disease (n=2), and amyotrophic lateral sclerosis (n=3). QTL datasets include protein QTL (n=3), expression QTL (n=9), and splicing QTL (n=14) datasets. We employed a set of stringent criteria for extracting the high confidence instrumental variables (IVs) from the QTL datasets that were specific and did not have widespread effects across multiple targets. We selected the appropriate MR model based on the number of proposed IVs for each target gene: n=1, Wald ratio estimator; n=2, IVW fixed-effect model; n≥3, IVW random-effect model, MR-presso, Maximum Likelihood, Egger, and Weighted Median methods. The MR analysis generated over 1.7 million results. (2) We integrated differential expressions from an AD brain cell atlas of over 1.1 million cells/nuclei from 26 datasets and bulk RNA-seq data from three brain biobanks (Mayo, MSBB, and ROSMAP), covering expression comparisons (n=1,400) of various types, such as disease vs. control and APOE4/4 vs. APOE3/3. (3) We generated the binding pockets of all the genes using artificial intelligence. (4) We compiled three types of biological networks, including protein-protein interaction, drug-target interaction, and cell-cell interaction networks. We envision that the abundant information and actionable systems biology tools served by AlzTarget will be a valuable resource for target identification and prioritization of AD.