Poster #

Presenting Poster Author Name

Last

Session/Workshop Area

Abstract Type

Title

Last Name of First Author

List all authors (first name first with names separated by commas and proper capitalization) in the order they appear on the abstract. Please do NOT list affiliations or addresses in this field.

Author affiliations (in order of the list of authors). Please separate affiliations with commas.

Abstract (300 words or less)

Poster DOI or URL (if you are uploading a PDF, type "N/A" in this field)

Poster PDF

Rachit

Kumar

30th anniversary

Accepted proceedings paper with poster presentation

A Comprehensive Bibliometric Analysis: Celebrating the Thirtieth Anniversary of the Pacific Symposium on Biocomputing

Kumar

Rachit Kumar, Rasika Venkatesh, David Y. Zhang, Teri E. Klein, Marylyn D. Ritchie

University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, Stanford University, University of Pennsylvania

The 2025 Pacific Symposium on Biocomputing (PSB) represents a remarkable milestone, as it is the thirtieth anniversary of PSB. We use this opportunity to analyze the bibliometric output of 30 years of PSB publications in a wide range of analyses with a focus on various eras that represent important disruptive breakpoints in the field of bioinformatics and biocomputing. These include an analysis of paper topics and keywords, flight emissions produced by travel to PSB by authors, citation and co-authorship networks and metrics, and a broad assessment of diversity and representation in PSB authors. We use the results of these analyses to identify insights that we can carry forward to the upcoming decades of PSB.

N/A

Leah

Zhang

30th anniversary

Accepted proceedings paper with poster presentation

Charting the Evolution and Transformative Impact of the Pacific Symposium on Biocomputing Through a 30-Year Retrospective Analysis of Collaborative Networks and Themes Using Modern Computational Tools

Zhang

Leah Zhang, Sameeksha Garg, Edward Zhang, Sean McOsker, Carly Bobak, Kristine Giffin, Brock Christensen, Joshua Levy

Thomas Jefferson High School for Science & Technology, Carnegie Mellon University, Dartmouth College Geisel School of Medicine, Dartmouth College Geisel School of Medicine, Dartmouth College Geisel School of Medicine, Dartmouth College Geisel School of Medicine, Dartmouth College Geisel School of Medicine, Cedars Sinai Medical Center

Founded nearly 30 years ago, the Pacific Symposium on Biocomputing (PSB) has continually promoted collaborative research in computational biology, annually highlighting emergent themes that reflect the expanding interdisciplinary nature of the field. This study aimed to explore the collaborative and thematic dynamics at PSB using topic modeling and network analysis methods. We identified 14 central topics that have characterized the discourse at PSB over the past three decades. Our findings demonstrate significant trends in topic relevance, with a growing emphasis on machine learning and integrative analyses. We observed not only an expanding nexus of collaboration but also PSB’s crucial role in fostering interdisciplinary collaborations. It remains unclear, however, whether the shift towards interdisciplinarity was driven by the conference itself, external academic trends, or broader societal shifts towards integrated research approaches. Future applications of next-generation analytical methods may offer deeper insights into these dynamics. Additionally, we have developed a web application that leverages retrieval augmented generation and large language models, enabling users to efficiently explore past PSB proceedings.

N/A

zhang_l.pdf

Maxwell

Levis

AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface

Accepted proceedings paper with poster presentation

Investigating the Differential Impact of Psychosocial Factors by Patient Characteristics and Demographics on Veteran Suicide Risk Through Machine Learning Extraction of Cross-Modal Interactions

Levy

Joshua Levy, Monica Dimambro, Alos Diallo, Jiang Gui, Brian Shiner, Maxwell Levis

Cedars Sinai Medical Center, White River Junction VA Medical Center, Dartmouth College Geisel School of Medicine, Dartmouth College Geisel School of Medicine, White River Junction VA Medical Center, White River Junction VA Medical Center

Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs’ suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models’ predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk– the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.

N/A

Ojas

Ramwala

AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface

Accepted proceedings paper with poster presentation

ClinValAI: A framework for developing Cloud-based infrastructures for the External Clinical Validation of AI in Medical Imaging

Ramwala

Ojas A. Ramwala, Kathryn P. Lowry, Daniel S. Hippe, Matthew P.N. Unrath, Matthew J. Nyflot, Sean D. Mooney, Christoph I. Lee

University of Washington, University of Washington, Fred Hutchinson Cancer Center, Pariveda Solutions, University of Washington, National Institutes of Health, University of Washington

Artificial Intelligence (AI) algorithms showcase the potential to steer a paradigm shift in clinical medicine, especially medical imaging. Concerns associated with model generalizability and biases necessitate rigorous external validation of AI algorithms prior to their adoption into clinical workflows. To address the barriers associated with patient privacy, intellectual property, and diverse model requirements, we introduce ClinValAI, a framework for establishing robust cloud-based infrastructures to clinically validate AI algorithms in medical imaging. By featuring dedicated workflows for data ingestion, algorithm scoring, and output processing, we propose an easily customizable method to assess AI models and investigate biases. Our novel orchestration mechanism facilitates utilizing the complete potential of the cloud computing environment. ClinValAI’s input auditing and standardization mechanisms ensure that inputs consistent with model prerequisites are provided to the algorithm for a streamlined validation. The scoring workflow comprises multiple steps to facilitate consistent inferencing and systematic troubleshooting. The output processing workflow helps identify and analyze samples with missing results and aggregates final outputs for downstream analysis. We demonstrate the usability of our work by evaluating a state-of-the-art breast cancer risk prediction algorithm on a large and diverse dataset of 2D screening mammograms. We perform comprehensive statistical analysis to study model calibration and evaluate performance on important factors, including breast density, age, and race, to identify latent biases. ClinValAI provides a holistic framework to validate medical imaging models and has the potential to advance the development of generalizable AI models in clinical medicine and promote health equity.

N/A

ramwala.pdf

Aidong

Zhang

AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface

Accepted proceedings paper with poster presentation

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

Xiong

Guangzhi Xiong,Qiao Jin,Xiao Wang,Minjia Zhang,Zhiyong Lu,Aidong Zhang

University of Virginia,National Institutes of Health,University of Illinois Urbana-Champaign

The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a conventional RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with conventional RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68\% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.

N/A

xiong.pdf

Ka'ulawena

Alipio

Earth Friendly Computation: Applying Indigenous Data Lifecycles in Medical and Sovereign AI

Accepted proceedings paper with poster presentation

Earth Friendly Computation 574: Indigenous Data Sovereignty, Circular Systems, and Solarpunk Solutions for a Sustainable Future

Alipio

Ka'ulawena Alipio, Javier García-Colón, Nima Boscarino, Keolu Fox

Indigenous Futures Institute University of California San Diego, Department of Anthropology University of California San Diego, Department of Communication University of California San Diego

Recent advancements in Artificial Intelligence (AI) and data center infrastructure have brought the global cloud computing market to the forefront of conversations about sustainability and energy use. Current policy and infrastructure for data centers prioritize economic gain and resource extraction, inherently unsustainable models which generate massive amounts of energy and heat waste. Our team proposes the formation of policy around earth-friendly computation practices rooted in Indigenous models of circular systems of sustainability. By looking to alternative systems of sustainability rooted in Indigenous values of aloha ‘āina, or love for the land, we find examples of traditional ecological knowledge (TEK) that can be imagined alongside Solarpunk visions for a more sustainable future. One in which technology works with the environment, reusing electronic waste (e-waste) and improving data life cycles.

N/A

Keolu

Fox

Earth Friendly Computation: Applying Indigenous Data Lifecycles in Medical and Sovereign AI

Accepted proceedings paper with poster presentation

AI in Point-of-Care - A Sustainable Healthcare Revolution at the Edge

Rajput

Yousuf Rajput, Tarek Tarif, Akira Wolfe, Eric Dawson, Keolu Fox

University of California San Diego,

University of California San Diego,

University of California San Diego,

Nvidia Corporation,

University of California San Diego

This paper examines the integration of artificial intelligence (AI) in point-of-care testing (POCT) to enhance diagnostic speed, accuracy, and accessibility, particularly in underserved regions. AI-driven POCT is shown to optimize clinical decision-making, reduce diagnostic times, and offer personalized healthcare solutions, with applications in genome sequencing and infectious disease management. The paper highlights the environmental challenges of AI, including high energy consumption and electronic waste, and proposes solutions such as energy-efficient algorithms and edge computing. It also addresses ethical concerns, emphasizing the reduction of algorithmic bias and the need for equitable access to AI technologies. While AI in POCT can improve healthcare and promote sustainability, collaboration within the POCT ecosystem—among researchers, healthcare providers, and policymakers—is essential to overcome the ethical, environmental, and technological challenges.

https://f1000research.com/posters/13-1454

rajput.pdf

Alexis

Akerele

Overcoming health disparities in precision medicine

Accepted proceedings paper with poster presentation

Uterine fibroids show evidence of shared genetic architecture with blood pressure traits

Akerele

Alexis T. Akerele, Jacqueline A. Piekos, Jeewoo Kim, Nikhil K. Khankari, Jacklyn N. Hellwege, Todd L. Edwards, Digna R. Velez Edwards

Meharry Medical College Vanderbilt University Vanderbilt University Medical Center, Vanderbilt University Vanderbilt University Medical Center, Vanderbilt University Vanderbilt University Medical Center, Vanderbilt University Medical Center, Vanderbilt University Medical Center, Vanderbilt University Medical Center, Vanderbilt University Medical Center

Uterine leiomyomata (fibroids, UFs) are common, benign tumors in females, having an estimated prevalence of up to 80%. They are fibrous masses growing within the myometrium leading to chronic symptoms like dysmenorrhea, abnormal uterine bleeding, anemia, severe pelvic pain, and infertility. Hypertension (HTN) is a common risk factor for UFs, though less prevalent in premenopausal individuals. While observational studies have indicated strong associations between UF and HTN, the biological mechanisms linking the two conditions remain unclear. Understanding the relationship between HTN and UFs is crucial because UFs and HTN lead to substantial comorbidities adversely impacting female health. Identifying the common underlying biological mechanisms can improve treatment strategies for both conditions. To clarify the genetic and causal relationships between UF and BP, we conducted a bidirectional, two-sample Mendelian randomization (MR) analysis and evaluated the genetic correlations across BP traits and UFs. We used data from a multi-ancestry genome-wide association study (GWAS) meta-analysis of UFs (44,205 cases and 356,552 controls), and data from a cross-ancestry GWAS meta-analysis of BP phenotypes (diastolic BP [DBP], systolic BP [SBP], and pulse pressure [PP], N=447,758). We evaluated genetic correlation of BP phenotypes and UF with linkage disequilibrium score regression (LDSC). LDSC results indicated a positive genetic correlation between DBP and UFs (Rg=0.132, p<5.0x10-5), and SBP and UF (Rg=0.063, p<2.5x10-2). MR using UFs as the exposure and BP traits as outcomes indicated a relationship where UF increases DBP (odds ratio [OR]=1.20, p<2.7x10-3). Having BP traits as exposures and UF as the outcome showed that DBP and SBP increase risk for UF (OR =1.04, p<2.2x10-3; OR=1.00, p<4.0x10-2; respectively). Our results provide evidence of shared genetic architecture and pleiotropy between HTN and UF, suggesting common biological pathways driving their etiologies. Based on these findings, DBP appears to be a stronger risk factor for UFs compared to SBP and PP.

N/A

akerele.pdf

Steven Christ

Jones

Overcoming health disparities in precision medicine

Accepted proceedings paper with poster presentation

The Impact of Ancestry on Genome-Wide Association Studies

Jones, Cardon

Steven Christopher Jones, Katie M. Cardone, Yuki Bradford, Sarah A. Tishkoff, Marylyn D. Ritchie

University of Pennsylvania

Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.

N/A

Brendan

Ball

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with poster presentation

Cross-Species Modeling Identifies Gene Signatures in Type 2 Diabetes Mouse Models Predictive of Inflammatory and Estrogen Signaling Pathways Associated with Alzheimer’s Disease Outcomes in Humans

Ball

Brendan K. Ball, Elizabeth A. Proctor, Douglas K. Brubaker

Purdue University, Penn State University, Case Western Reserve University

Alzheimer’s disease (AD), the predominant form of dementia, is influenced by several risk factors, including type 2 diabetes (T2D), a metabolic disorder characterized by the dysregulation of blood sugar levels. Despite mouse and human studies reporting this connection between T2D and AD, the mechanism by which T2D contributes to AD pathobiology is not well understood. A challenge in understanding mechanistic links between these conditions is that evidence between mouse and human experimental models must be synthesized, but translating between these systems is difficult due to evolutionary distance, physiological differences, and human heterogeneity. To address this, we employed a computational framework called translatable components regression (TransComp-R) to overcome discrepancies between pre-clinical and clinical studies using omics data. Here, we developed a novel extension of TransComp-R for multi-disease modeling to analyze transcriptomic data from brain samples of mouse models of AD, T2D, and simultaneous occurrence of both disease (ADxT2D) and postmortem human brain data to identify enriched pathways predictive of human AD status. Our TransComp-R model identified inflammatory and estrogen signaling pathways encoded by mouse principal components derived from models of T2D and ADxT2D, but not AD alone, predicted with human AD outcomes. The same mouse PCs predictive of human AD outcomes were able to capture sex-dependent differences in human AD biology, including significant effects unique to female patients, despite the TransComp-R being derived from data from only male mice. We demonstrated that our approach identifies biological pathways of interest at the intersection of the complex etiologies of AD and T2D which may guide future studies into pathogenesis and therapeutic development for patients with T2D-associated AD.

N/A

ball.pdf

Bramsh

Chandio

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with poster presentation

Amyloid, Tau, and APOE in Alzheimer's Disease: Impact on White Matter Tracts

Chandio

Bramsh Qamar Chandio, Julio E. Villalon-Reina, Talia M. Nir, Sophia I. Thomopoulos, Yixue Feng, Sebastian Benavidez, Neda Jahanshad, Jaroslaw Harezlak, Eleftherios Garyfallidis, Paul M. Thompson

University of Southern California, University of Southern California, University of Southern California, University of Southern California, University of Southern California, University of Southern California, University of Southern California,

Indiana University Bloomington, Indiana University Bloomington, University of Southern California

Alzheimer's disease (AD) is characterized by cognitive decline and memory loss due to the abnormal accumulation of amyloid-beta plaques and tau tangles in the brain; its onset and progression also depend on genetic factors such as the apolipoprotein E (APOE) genotype. Understanding how these factors affect the brain's neural pathways is important for early diagnostics and interventions. Tractometry is an advanced technique for 3D quantitative assessment of white matter tracts, localizing microstructural abnormalities in diseased populations in vivo. In this work, we applied BUAN (Bundle Analytics) tractometry to 3D diffusion MRI data from 730 participants in ADNI3 (phase 3 of the Alzheimer's Disease Neuroimaging Initiative; age range: 55-95 years, 349M/381F, 214 with mild cognitive impairment, 69 with AD, and 447 cognitively healthy controls). Using along-tract statistical analysis, we assessed the localized impact of amyloid, tau, and APOE genetic variants on the brain's neural pathways. BUAN quantifies microstructural properties of white matter tracts, supporting along-tract statistical analyses that identify factors associated with brain microstructure. We visualize the 3D profile of white matter tract associations with tau and amyloid burden in Alzheimer's disease; strong associations near the cortex may support models of disease propagation along neural pathways. Relative to the neutral genotype, APOE E3/E3, carriers of the AD-risk conferring APOE E4 genotype show microstructural abnormalities, while carriers of the protective E2 genotype also show subtle differences. Of all the microstructural metrics, mean diffusivity (MD) generally shows the strongest associations with AD pathology, followed by axial diffusivity (AxD) and radial diffusivity (RD), while fractional anisotropy (FA) is typically the least sensitive metric. Along-tract microstructural metrics are sensitive to tau and amyloid accumulation, showing the potential of diffusion MRI to track AD pathology and map its impact on neural pathways.

N/A

chandio.pdf

Marylyn

Ritchie

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with poster presentation

Integrated exposomic analysis of lipid phenotypes: Leveraging GE.db in environment by environment interaction studies

Garao Rico

Andre Luis Garao Rico, Nicole Palmiero, Marylyn D. Ritchie, Molly A. Hall

Department of Genetics University of Pennsylvania

Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.

N/A

Anni

Moore

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with poster presentation

Connecting intermediate phenotypes to disease using multi-omics in heart failure

Moore

Anni Moore*, Rasika Venkatesh*, Michael G. Levin, Scott M. Damrauer, Nosheen Reza, Thomas P. Cappola, Marylyn D. Ritchie

University of Pennsylvania Perelman School of Medicine, University of Pennsylvania Perelman School of Medicine, University of Pennsylvania Perelman School of Medicine, University of Pennsylvania Perelman School of Medicine, University of Pennsylvania Perelman School of Medicine,

University of Pennsylvania Perelman School of Medicine,

University of Pennsylvania Perelman School of Medicine

Heart failure (HF) is one of the most common, complex, heterogeneous diseases in the world, with over 1-3% of the global population living with the condition. Progression of HF can be tracked via MRI measures of structural and functional changes to the heart, namely left ventricle (LV), including ejection fraction, mass, end-diastolic volume, and LV end-systolic volume. Moreover, while genome-wide association studies (GWAS) have been a useful tool to identify candidate variants involved in HF risk, they lack crucial tissue-specific and mechanistic information which can be gained from incorporating additional data modalities. This study addresses this gap by incorporating transcriptome-wide and proteome-wide association studies (TWAS and PWAS) to gain insights into genetically-regulated changes in gene expression and protein abundance in precursors to HF measured using MRI-derived cardiac measures as well as full-stage all-cause HF. We identified several gene and protein overlaps between LV ejection fraction and end-systolic volume measures. Many of the overlaps identified in MRI-derived measurements through TWAS and PWAS appear to be shared with all-cause HF. We implicate many putative pathways relevant in HF associated with these genes and proteins via gene-set enrichment and protein-protein interaction network approaches. The results of this study (1) highlight the benefit of using multi-omics to better understand genetics and (2) provide novel insights as to how changes in heart structure and function may relate to HF.

N/A

Manu

Shivakumar

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with poster presentation

Frequency of adding salt is a stronger predictor of chronic kidney disease in individuals with genetic risk

Shivakumar

Manu Shivakumar, Yanggyun Kim, Sang-Hyuk Jung, Jakob Woerner, Dokyoon Kim

University of Pennsylvania, Kyung Hee University

The incidence of chronic kidney disease (CKD) is increasing worldwide, but there is no specific treatment available. Therefore, understanding and controlling the risk factors for CKD are essential for preventing disease occurrence. Salt intake raises blood pressure by increasing fluid volume and contributes to the deterioration of kidney function. Thus, a low-salt diet is important to reduce blood pressure and prevent kidney diseases. The impact of lifestyle factors on disease occurrence or prevention may vary based on genetic factors. This study aims to investigate whether frequency of adding salt has different effects depending on genetic risk for CKD. CKD polygenic risk scores (PRS) were generated for 12,279 CKD cases identified over an average follow-up of 8 years in UK Biobank. We classified the individuals into four groups based on PRS: low (0-19%), intermediate (20-79%), high (80-94%), very high (≥ 95%). Incidence of CKD increased incrementally according to CKD PRS even after adjusting for age, sex, Townsend deprivation index, body mass index, estimated glomerular filtration rate, smoking, alcohol, physical activity, diabetes mellitus, dyslipidemia, hypertension, coronary artery diseases, cerebrovascular diseases at baseline. Compared to the “never/rarely”, “always” frequency of adding salt group had an increasing incidence of CKD proportionate to the degree of frequency of adding salt. However, the significant association of “always” group on incident CKD disappeared in the low PRS group. This study validated the signal from PRSs for CKD across a large cohort and confirmed that frequency of adding salt contributes to the occurrence of CKD. Additionally, it confirmed that the effect of frequency of “always” adding salt on CKD incidence is greater in those with more than intermediate CKD-PRS. This study suggests that increased salt intake is particularly concerning for individuals with genetic risk factors for CKD, underscoring the clinical importance of reducing salt intake for these individuals.

N/A

Cagri

Ozdemir

Translating Big Data Imaging Genomics Findings to the Individual: Prediction of Risks and Outcomes in Neuropsychiatric Illnesses

Accepted proceedings paper with poster presentation

A Dynamic Model for Early Prediction of Alzheimer’s Disease by Leveraging Graph Convolutional Networks and Tensor Algebra

Ozdemir

Cagri Ozdemir, Mohammad Al Olaimat, Serdar Bozdag, Alzheimer’s Disease Neuroimaging Initiative

Department of Computer Science and Engineering University of North Texas,

Department of Mathematics University of North Texas, BioDiscovery Institute University of North Texas,

Center for Computational Life Sciences University of North Texas

Alzheimer's disease (AD) is a neurocognitive disorder that deteriorates memory and impairs cognitive functions. Mild Cognitive Impairment (MCI) is generally considered as an intermediate phase between normal cognitive aging and more severe conditions such as AD. Although not all individuals with MCI will develop AD, they are at an increased risk of developing AD. Diagnosing AD once strong symptoms are already present is of limited value, as AD leads to irreversible cognitive decline and brain damage. Thus, it is crucial to develop methods for the early prediction of AD in individuals with MCI. Recurrent Neural Networks (RNN)-based methods have been effectively used to predict the progression from MCI to AD by analyzing electronic health records (EHR). However, despite their widespread use, existing RNN-based tools may introduce increased model complexity and often face difficulties in capturing long-term dependencies. In this study, we introduced a novel Dynamic deep learning model for Early Prediction of AD (DyEPAD) to predict MCI subjects’ progression to AD utilizing EHR data. In the first phase of DyEPAD, embeddings for each time step or visit are captured through Graph Convolutional Networks (GCN) and aggregation functions. In the final phase, DyEPAD employs tensor algebraic operations for frequency domain analysis of these embeddings, capturing the full scope of evolutionary patterns across all time steps. Our experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and National Alzheimer’s Coordinating Center (NACC) datasets demonstrate that our proposed model outperforms or is in par with the state-of-the-art and baseline methods.

N/A

ozdemir.pdf

Yidi

Huang

AI and Machine Learning in Clinical Medicine

Accepted proceedings paper with oral presentation

PGxQA: A Resource for Evaluating LLM Performance for Pharmacogenomic QA Tasks

Keat

Karl Keat, Rasika Venkatesh, Yidi Huang, Rachit Kumar, Sony Tuteja, Katrin Sangkuhl, Binglan Li, Li Gong, Michelle Whirl-Carrillo, Teri E. Klein, Marylyn D. Ritchie, Dokyoon Kim

University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, University of Pennsylvania, University of Pennsylvania

Pharmacogenetics represents one of the most promising areas of precision medicine, with several

guidelines for genetics-guided treatment ready for clinical use. Despite this, implementation has been

slow, with few health systems incorporating the technology into their standard of care. One major

barrier to uptake is the lack of education and awareness of pharmacogenetics among clinicians and

patients. The introduction of large language models (LLMs) like GPT-4 has raised the possibility of

medical chatbots that deliver timely information to clinicians, patients, and researchers with a simple

interface. Although state-of-the-art LLMs have shown impressive performance at advanced tasks

like medical licensing exams, in practice they still often provide false information, which is

particularly hazardous in a clinical context. To quantify the extent of this issue, we developed a series

of automated and expert-scored tests to evaluate the performance of chatbots in answering

pharmacogenetics questions from the perspective of clinicians, patients, and researchers. We applied

this benchmark to state-of-the-art LLMs and found that newer models like GPT-4o greatly

outperform their predecessors, but still fall short of the standards required for clinical use. Our

benchmark will be a valuable public resource for subsequent developments in this space as we work

towards better clinical AI for pharmacogenetics.

N/A

gcb_retreat_2024_poster.pdf

Justin

Krogue

AI and Machine Learning in Clinical Medicine

Accepted proceedings paper with oral presentation

Searching for Dermatology Information Online using Images vs Text: a Randomized Study

Krogue

Justin D Krogue, Rory Sayres, Jay Hartford, Amit Talreja, Pinal Bavishi, Natalie Salaets, Kimberley Raiford, Jay Nayar, Rajan Patel, Yossi Matias, Greg S Corrado, Dounia Berrada, Harsh Kharbanda, Lou Wang, Dale R Webster, Quang Duong, Peggy Bui, Yun Liu

Google

Background: Skin conditions are prevalent globally and significantly impact anxiety and morbidity. While individuals commonly use text-based search (e.g., “red rash on arm”) to investigate skin concerns, this process is often hindered by challenges in accurately describing lesion morphology. This study evaluates user experiences with an artificial intelligence image-based search compared to traditional text-based search for identifying skin conditions.

Methods: An internet-based survey recruited 372 respondents from a commercial survey panel, including participants willing to photograph their visible skin concerns. Using the Google mobile app, respondents conducted both text-based (Google Search) and image-based (Google Lens) searches, with the order randomized. Satisfaction across six dimensions was recorded for each modality, and preferences were compared.

Results: Of the respondents, 44% identified as women, 86% as White, and 41% were over age 45. 81.5% and 63.5% were at least moderately familiar with text-based and image-based search respectively. Respondents expressed high satisfaction with both modalities, with over 90% at least somewhat satisfied across all dimensions. Direct comparisons revealed a preference for image-based search in 5 of 6 dimensions, with a 9.9% overall preference for image-based search (p=0.004). Notably, 82.5% (95% CI 78.2–86.3) expressed a desire to incorporate image-based search in future queries, whether alone or in combination with text-based search, with 64% of these favoring image-based search as the initial approach.

Conclusion: Despite less familiarity, participants favored image-based search for skin conditions and indicated a strong preference to integrate it into future searches. These findings highlight the potential of image-based search as a key tool for improving the accessibility and accuracy of online information regarding skin concerns.

N/A

dermatologyimagevstextsearchposter.pdf

Serena

Zhang

AI and Machine Learning in Clinical Medicine

Accepted proceedings paper with oral presentation

ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

Rao

Vishwanatha M. Rao*, Serena Zhang*, Julian N. Acosta, Subathra Adithan, Pranav Rajpurkar

Department of Biomedical Informatics Harvard Medical School Boston, MA 02115, USA,

Department of Biomedical Informatics Harvard Medical School Boston, MA 02115, USA,

Department of Biomedical Informatics Harvard Medical School Boston, MA 02115, USA,

2Department of Radiodiagnosis, Jawaharlal Institute of Postgraduate Medical Education and

Research, India,

Department of Biomedical Informatics Harvard Medical School Boston, MA 02115, USA

Accurately interpreting medical images and writing radiology reports is a critical but challenging task in healthcare. Both human-written and AI-generated reports can contain errors, ranging from clinical inaccuracies to linguistic mistakes. To address this, we introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. Working with board-certified radiologists, we developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility. ReXErr demonstrates consistency across error categories and produces errors that closely mimic those found in real-world scenarios. This method has the potential to aid in the development and evaluation of report correction algorithms, potentially enhancing the quality and reliability of radiology reporting.

https://f1000research.com/posters/13-1451

Junwen

Wang

AI and Machine Learning in Clinical Medicine

Poster only

TPepPro: a deep learning model for predicting peptide-protein interactions

Chen

Zimeng Chen, Xiaohong Jin, Dan Yu, Qianhui Jiang, Zhuobin Chen, Bin Yan, Jing Qin, Yong Liu, Junwen Wang

,Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong,

Hong Kong SAR, China,

School of Electronic Information, Guangxi University for Nationalities, Nanning, China,

3School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen

Guangdong 518107, China,

4School of Artificial Intelligence, Guangxi University for Nationalities, Nanning, China,

5State Key Laboratory of Pharmaceutical Biotechnology, The University of Hong Kong, Hong Kong SAR, China,

6HKU Shenzhen Hospital, Shenzhen, China.

Motivation: Peptides and their derivatives hold potential as therapeutic agents. The rising interest in

developing peptide drugs is evidenced by increasing approval rates by the FDA of USA. To identify the

most potential peptides, study on peptide-protein interactions presents a very important approach but

poses considerable technical challenges. In experimental aspects, the transient nature of peptideprotein interactions (PepPIs) and the high flexibility of peptides contribute to elevated costs and

inefficiency. Traditional docking and molecular dynamics simulation methods require substantial

computational resources, and the predictive accuracy of their results remain unsatisfactory.

Results: To address this gap, we proposed TPepPro, a Transformer-based model for PepPI prediction.

We trained TPepPro on a dataset of 19,187 pairs of peptide-protein complexes with both sequential

and structural features. TPepPro utilizes a strategy that combines local protein sequence feature

extraction with global protein structure feature extraction. Moreover, TPepPro optimizes the

architecture of structural featuring neural network in BN-ReLU arrangement, which notably reduced

the amount of computing resources required for peptide-protein interactions prediction. According to

comparison analysis, the accuracy reached 0.855 in TPepPro, achieving an 8.1% improvement

compared to the second-best model TAGPPI. TPepPro achieved an AUC of 0.922, surpassing the

second-best model TAGPPI with 0.844. Moreover, the newly developed TPepPro identify certain

PepPIs that can be validated according to previous experimental evidence, thus indicating the

efficiency of TPepPro to detect high potential PepPIs that would be helpful for amino acid drug

applications.

https://doi.org/10.1093/bioinformatics/btae708

Jaehyeok

Jang

AI and Machine Learning in Clinical Medicine

Poster only

Deep learning model for the prognostic prediction of acute inflammation of the central nervous system using T2-weighted brain magnetic resonance imaging

Choi

Bo Kyu Choi, M.D.

Jaehyeok Jang, M.D.

Yoonhyeok Choi

Kyung Min Kim, M.D.

Yu Rang Park, Ph.D.

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea.

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea.

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

Acute inflammation of the central nervous system (CNS) caused by various pathogens could lead to poor prognosis, making rapid diagnosis and treatment crucial. We used a modified DenseNet-169-based deep learning model for the prognostic prediction of CNS acute inflammation using brain MRI data.

T2-weighted brain MRI images were collected retrospectively from patients with acute CNS inflammation who were admitted to a tertiary referral hospital between January 2010 and December 2023. Data obtained after January 2020 were utilized as external test sets. After preprocessing using the FastSurfer library, the T2WI was converted into a 3D NumPy array. Prognosis prediction was performed by categorizing patients based on their modified Rankin Scale scores at discharge, with scores of 3 or higher classified as poor outcomes and scores below 3 as good outcomes. The MRI data of the entire patient cohort were used for training, and to account for differences in prognosis based on the cause of inflammation, the data were divided into four groups, with training conducted separately for each group. The performance of each model was assessed using accuracy and area under the receiver operating characteristic curve (AUROC).

Out of a total of 570 patients, 483 were used for training, and 70 were used for validation. After training on the entire patient cohort using the deep learning model, testing on the external dataset yielded an AUROC of 0.7804 and an accuracy of 0.8231. When patients were classified by pathogen and separate models were built and trained for each group, the results showed AUROC values of 0.8737 for autoimmune, 0.6667 for bacterial, 1.0000 for tuberculosis, and 0.6513 for viral cases.

The deep learning model, which uses T2-weighted 3D brain MRI data, could predict the prognosis of acute inflammation in CNS.

This study was supported by the Hyundai Motor Chung Mong-Koo Foundation.

https://f1000research.com/posters/13-1444

20241128_psb2024_mri_poster_final.pdf

Hyunjun

Choi

AI and Machine Learning in Clinical Medicine

Poster only

Enhanced QTc Interval Monitoring in the CSICU: Evaluating the Impact of Synthetic Data and Machine Learning Techniques

Choi

Hyunjun Choi, Debbie Lin Teodorescu, Trevor Mears, Gizem Bilgili, Xi Li, Jui-Hsuan Chang, Nicholas Matsumoto, Miguel E. Hernandez, Zhiping Paul Wang, Bernice Coleman, Jason H. Moore

Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars Sinai Medical Center, Smidt Heart Institute, Cedars Sinai Medical Center, Smidt Heart Institute, Cedars Sinai Medical Center, Pulse Heart Institute, Nursing Research and Quality Department, Brawerman Nursing Institute, Cedars Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars Sinai Medical Center, Smidt Heart Institute, Cedars Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars Sinai Medical Center

Monitoring and intervening when QTc ≥ 500 milliseconds in electrocardiograms (ECGs) is a proven method to reduce sudden cardiac death (SCD) risk in critically-ill patients. The gold standard for QTc monitoring involves repeated manual measurements and analysis of 12-lead electrocardiograms throughout a period of elevated risk. The significant financial and human resource costs have led to spotty adoption. While automated, continuous monitoring of QTc is available on certain telemetry setups, such surveillance has been found to have imperfect correlation with gold standard on linear correlation measures as well as classification methods. Many clinical and electrophysiological variables have been proposed to affect automated telemetric QTc performance, but the labor-intensiveness and cost of data collection have limited clinical studies to <150 samples. The scarcity of validated, patient clinical data poses a significant obstacle in the development of precise machine learning algorithms.

This preliminary study explores potential approaches to the data scarcity challenge in developing machine learning models for QTc interval monitoring in a cardiac surgical intensive care unit. It investigates the use of synthetic data generation techniques as a possible adjunct to manual 12-lead ECG measurements, to screen potential classification models and to assess the value as well as extent of further data collection needed in a cost-effective manner. We selected high-performing machine learning models that were trained on the original datasets, synthetic datasets, and combined datasets using Aliro, an AI-driven data science tool.

Under specific hyperparameter configurations optimized for both synthetic and combined datasets, the trained Gradient Boosting Classifier (GBC) and K-Nearest Neighbors (KNN) models demonstrated promising performance metrics on the independent test set. Both models achieved a Positive Predictive Value (PPV) of approximately 70% or higher and a Negative Predictive Value (NPV) of approximately 95% or higher. It is crucial to verify these findings using additional datasets.

https://github.com/CenterAIResearch/SYN_DATA_QTc/blob/main/Poster/QTC_Syn_poster_psb2025-Custom_40x32.pdf

qtc_syn_poster_psb2025custom_40x32.pdf

Sooyoung

Jang

AI and Machine Learning in Clinical Medicine

Poster only

Multimodal Weakly Supervised Multiple Instance Learning model for Prediction of Remission Failure in Pediatric Crohn’s Disease

Jang

Sooyoung Jang, MD*1, Min Kyoon Yoo*1, Eun Joo Lee, MD, PhD2, Sowon Park, MD, PhD2, Hyeji Lim, MD2, JaeSeong Hong1, Jong Hyun Kim, MS3, Bo Kyu Choi, MD, PhD1, Hong Koh, MD, PhD2, Yu Rang Park, PhD1

1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

2Division of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Yonsei University College of Medicine, Severance Fecal Microbiota Transplantation Center, Severance Hospital, Seoul, Republic of Korea.

3LG AI Research, Seoul, Republic of Korea.

*Sooyoung Jang and Min Kyoon Yoo contributed equally to this research.

Background

Crohn's disease (CD) is a chronic inflammatory bowel disease characterized by skip lesions and transmural inflammation throughout the gastrointestinal tract. CD prognosis assessment at the time of diagnosis relies on clinical parameters, endoscopic findings, and magnetic resonance enterography (MRE) imaging. Deep learning approaches for predicting CD remission failure remain unexplored, despite its clinical significance in disease progression. Multimodal data integration remains challenging in endoscopy and MRE analysis, as these modalities generate multiple images per patient requiring labor-intensive image-by-image labeling.

Methods

We developed MWMIL (Multimodal Weakly Supervised Multiple Instance Learning), a novel deep learning framework that integrates endoscopic images, MRE images, and clinical data using weakly supervised multiple instance learning. The model implements clustering-constrained attention multiple instance learning for endoscopic and MRE images, followed by multimodal feature integration with attention-based fusion. Model evaluation was performed using 10-fold cross-validation.

Results

We analyzed data from Severance Hospital, including 127 pediatric CD patients with 5,217 endoscopic images, 181,497 MRE images, and 49 clinical parameters. MWMIL demonstrated an AUROC of 0.824, exhibiting superior performance compared to unimodal approaches using XGBoost with clinical data (AUROC 0.685), CLAM with MRE images (AUROC 0.730), and CLAM with endoscopic images (AUROC 0.808). The model surpassed clinician performance across sensitivity, specificity, and accuracy metrics. The attention mechanism identified clinically relevant features, with severe inflammatory lesions showing high attention scores in remission failure cases.

Conclusions

This investigation establishes remission failure in CD as a distinguishable prognostic indicator at initial diagnosis. MWMIL demonstrates the efficacy of weakly supervised multiple instance learning in analyzing complex medical imaging data without image-level annotations, advancing the framework for multimodal data integration in comprehensive clinical assessment scenarios.

https://f1000research.com/posters/13-1425

20241125_psb2024_mwmil_poster.pdf

Olga

Lyudovyk

AI and Machine Learning in Clinical Medicine

Poster only

Predicting T-cell receptor– epitope specificity from sequence

Lyudovyk

Olga Lyudovyk, Artem Streltsov, Yuval Elhanati, Quaid Morris, Benjamin Greenbaum

Weill Cornell Medicine, Cornell University, Memorial Sloan Kettering Cancer Center, Memorial Sloan Kettering Cancer Center, Memorial Sloan Kettering Cancer Center

Accurately predicting T-cell receptor specificity for antigens could have wide practical applications for many disease areas including cancer immunotherapy and cell therapies. We explored the ability of a sequence-based transformer model to predict TCR-epitope specificity. Relevant for clinical applications, BERTie requires only CDR3 of TCR beta chains to predict TCR specificity. BERTie has high accuracy for predicting new TCRs for the epitopes for which training data exists and generalizes well to epitopes similar in sequence to epitopes in training data. However, neither BERTie nor any other evaluated sequence-based models generalize to cancer neoantigen epitopes, likely because these neoantigens are very different from largely viral antigens in the training data. Cancer neoantigen-specific experimental datasets and potentially approaches beyond sequence are required.

https://f1000research.com/posters/13-1469

bertie_for_psb.pdf

Ludovica

Montanucci

AI and Machine Learning in Clinical Medicine

Poster only

Predicting variant pathogenicity and functional effects in glutamate receptors: transfer learning from NMDA to AMPA receptors.

Montanucci

Ludovica Montanucci, Tobias Brünger, and Dennis Lal

Department of Neurology, and Center for Neurogenetics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Glutamate receptors, in particular N-methyl-D-aspartate receptors (NMDAR) and α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor (AMPAR), are ion channels mainly expressed in the brain and central nervous system and mediate most excitatory neurotransmissions at the base of complex functions such as learning and memory. Genetic variants in the genes encoding these receptors, have been associated with severe heterogeneous neurologic, and neurodevelopmental diseases, often with early onset. Predicting the effect of genetic variants in these genes on both a phenotypic and a molecular level is crucial to improve diagnosis of rare monogenic diseases and to prescribe personalized treatments. Indeed, on a molecular level, the genetic variants can lead to an increase or a partial or complete decrease of the glutamate receptor function, conditions which require opposite therapeutics treatments.

We recently developed two machine learning predictors to predict pathogenicity and functional effect of variants in NMDAR, by assembling of 223 missense variants from 370 patients, along with 640 control variants from the general population, and 160 missense variants characterized by electrophysiological readouts.

By mapping these variants onto the NMDAR protein structures, we found that spatial proximity to ligands bound to the agonist and antagonist binding sites is a key predictive feature for variant pathogenicity and molecular functional consequences. Leveraging this feature and additional evolutionary and biophysical features, we developed two machine learning-based predictors: a pathogenicity predictor for NMDAR missense variants, which outperforms currently available predictors (AUC=0.960, MCC=0.778), and the first binary predictor of molecular function for NMDAR missense variants (AUC=0.756, MCC=0.432).

In case of AMPA receptors however the number of variants for which pathogenicity or functional effect is known is much lower, is not sufficient to develop ad-hoc predictors. Here we explore the possibility of using transfer learning to adapt the developed NMDAR predictors to AMPAR predictors.

N/A

Serguei

Pakhomov

AI and Machine Learning in Clinical Medicine

Poster only

Automated Neural Nursing Assistant (ANNA): A preliminary feasibility study

Pakhomov

Serguei Pakhomov, PhD

Jacob Solinsky

Caitlynn Olson

Riley Stuckey

Martin Michalowski, PhD

Ashley Petersen, PhD

Veronika Bachanova, MD

University of Minnesota College of Pharmacy, University of Minnesota College of Pharmacy, University of Minnesota Masonic Cancer Center, University of Minnesota Masonic Cancer Center, University of Minnesota School of Nursing, University of Minnesota School of Public Health, University of Minnesota Medical School

We present a preliminary analysis of the feasibility of using a fully automated AI-based system for intensive monitoring of neurotoxicity that frequently appear as a result of immunotherapy for hematologic malignancies. Early manifestations of these symptoms are evident in the patient’s speech in the form of aphasia and confusion and can be detected and effectively treated prior to onset of more serious impairment. We have developed the Automated Neural Nursing Assistant (ANNA) system designed to conduct a brief cognitive assessment three times per day over the telephone for 5-14 days following the infusion of CAR-T immunotherapy medication. ANNA uses a conversational agent based on a large language model to elicit spontaneous speech in a semi-structured dialogue, followed by a series of brief language-based neurocognitive tests. To determine the feasibility of using ANNA, we are conducting a prospective study in patients undergoing immunotherapy at the University of Minnesota Masonic Cancer Center. As of October 2024, 27 patients have been enrolled and 6 dropped out prior to infusion. Seven patients (33%) developed neurotoxicity (ICANS). Study investigators are currently blinded to ICANS status. A total of 45% of ANNA’s telephone assessments have been completed with the conversational part of ANNA’s assessments containing on average 11 (SD 3.5) sentences and 82.5 (SD 48.5) words. As anticipated, the results so far indicate that acceptability of ANNA’s assessments in oncology patients is highly variable with some patients completing all or most of their assessments and some completing only a small fraction. Any therapeutic benefit to using ANNA will only be possible with patients that engage with this technology on a regular basis which may have implications for patient education and system improvement. It is also possible that changes in patient engagement with ANNA may be indicative of impending ICANS which we will investigate on study completion.

https://cair.umn.edu/sites/cair.umn.edu/files/2024-11/PSB2025-poster-30x40-template.pdf

psb2025poster30x40template.pdf

Hae In

Park

AI and Machine Learning in Clinical Medicine

Poster only

Graph Attention Network Analysis of Whole Genome Sequencing Data
Reveals Novel Genes Associated with Autism Spectrum Disorder

Park

Min Ji Kang, Hae In Park, Sanghyuk Lee

EWHA Research Center for Systems Biology (ERCSB), Department of Life Science, Ewha Womans University, Seoul 03760, Republic of Korea

EWHA Research Center for Systems Biology (ERCSB), Department of Life Science, Ewha Womans University, Seoul 03760, Republic of Korea

EWHA Research Center for Systems Biology (ERCSB), Department of Life Science, Ewha Womans University, Seoul 03760, Republic of Korea

Autism spectrum disorder (ASD) is a highly heritable neurodevelopmental disorder, yet its underlying genetic mechanisms remain elusive. Both coding and non-coding variants have been implicated in ASD, with rare inherited variants (RIVs) increasingly recognized for their role in ASD pathology. We developed a Graph Attention Network (GAT) model using whole genome sequencing (WGS) data from 2,186 individuals in a Korean ASD family cohort. Our model, which consists of 1,138 genes with RIVs and incorporates the STRING network, achieved an accuracy of 0.9018 and an AUC of 0.8985 in identifying ASD patients. Subsequent analysis of attention scores identified 89 significant genes associated with ASD, including 41 novel genes. Functional analysis revealed enrichment in ASD-related pathways such as synapse function, calcium activity, and cell communication. Thus, our RIV-based GAT model successfully classifies ASD patients and identifies a novel class of genes and variants that have not yet been associated with ASD.

N/A

psbposter_haeinpark.pdf

Graham

Schultz

AI and Machine Learning in Clinical Medicine

Poster only

Exploratory Analysis of Genomic Patterns in Human Cytomegalovirus Using Genome-Scale Language Models

Schultz

Graham M. Schultz, Olivia Daigle, John P. Hudson, Christian Darabos, Pierce Longmire, Felicia Goodrum, Giovanni Bosco, Carly A. Bobak

Dartmouth College, Dartmouth College, Dartmouth College, Dartmouth College, University of Arizona, University of Arizona, Dartmouth College, Dartmouth College

Genomic language models (gLMs) represent a transformative tool in genomics, adapting principles from natural language processing (NLP) to interpret the "language" of genomes. They are uniquely suited to analyzing genomic sequences due to their ability to capture both local and global dependencies, which is essential for understanding structural variations, regulatory elements, and evolutionary dynamics. Unlike traditional methods that rely heavily on predefined biological assumptions, gLMs leverage massive datasets to learn complex patterns. GenSLM distinguishes itself as a foundation model trained on over 110 million prokaryotic gene sequences, further fine-tuned with 1.5 million SARS-CoV-2 genomes. This extensive pretraining allows it to generalize across genomic contexts, offering insights into conserved mechanisms and novel variations without requiring extensive domain-specific customization.

This research applies GenSLM to the human cytomegalovirus (HCMV) genome, a significant pathogen with implications for immunocompromised populations. Using a sliding window approach to capture local and global genomic variations, the study tokenized HCMV sequences at the 3-mer level and utilized high-performance computing (HPC) to process genomic embeddings, visualized with t-SNE. The model identified key HCMV genes such as UL82 and UL19. UL82, encoding the tegument protein pp71, enhances early gene expression and disrupts host antiviral responses, while UL19 modulates immune receptor pathways, underscoring its importance in immune evasion.

A key finding from the statistical analyses includes a significant over-representation of high-attention 3-mers in non-coding regions (χ² = 21.481, df = 3, p < 0.0001), highlighting the ability of GenSLM to capture regulatory features often overlooked in traditional genomic analyses. Attention mechanisms in GenSLM identified biologically significant genes across all levels of conservation, emphasizing its ability to prioritize regions with functional significance.

This study shows that gLMs trained on diverse viral datasets can identify conserved mechanisms and novel genomic features, highlighting GenSLM's potential to advance diagnostics, therapies, and our understanding of HCMV biology.

N/A

poster_draft_cab.pdf

Lisa

Shieh

AI and Machine Learning in Clinical Medicine

Poster only

Using AI for Detecting Clinical Deterioration: Insights and Responses from the Care Team

Shieh

Lisa Shieh, MD, PhD, Yejin Jeong, BA, Margaret Smith, MBA, Robert Gallo, MD, Jerri Westphal, MSN, RN, Aubrey Florom-Smith, PhD, RN, Lisa Knowlton, MD, Steven Lin, MD

Stanford University School of Medicine

Context:

An AI clinical deterioration model-enabled intervention implemented at a large academic medical center in 2021 was recently reported to have decreased the absolute risk of care escalations among inpatients by 10.4% over two years. This study aims to evaluate the experiences of nurses and physicians participating in this AI-enabled workflow.

Objective:

Assess the care team’s experiences and gather feedback on the AI-enabled clinical deterioration pathway, including both the prediction tool and collaborative workflows that follow.

Study Design and Analysis:

From September 2023 to February 2024, our team distributed a comprehensive survey to the care team. We analyzed the survey results using statistical methods and free text data was thematically analyzed using Stanford’s secure instance of GPT-4.

Population:

Attending physicians, residents, interns, bedside nurses, resource nurses, and float nurses from general surgery and hospital medicine inpatient units who had access to the AI-enabled clinical deterioration pathway. The survey analysis cohort consisted of 125 participants.

Intervention:

AI-enabled mobile alerts, and EHR best practice alerts based on Epic Deterioration Index prediction model scores, followed by a standardized, multidisciplinary bedside huddle.

Results:

Both physicians and nurses often agreed with the tool’s determination of patients at high risk (M = 2.608 (SD = 0.69), p < 0.001). However, there were differences between physicians (n = 44) and nurses (n = 77) across various aspects, including knowledge gain, patient outcomes, satisfaction levels, perceived pathway value, and sustainability (p < .001). Nurses consistently showed higher satisfaction than physicians, who were neutral about the tool, workflow, and impact on patient care. Most participants expressed interest in obtaining more information about the model and suggested modifications to the workflow, documentation, and team huddle.

Conclusions:

Both physicians and nurses generally concurred with the tool’s identification of high-risk patients. Nurses displayed more favorable attitudes, engagement, and outlook in the intervention.

https://f1000research.com/posters/13-1440

2025pbs_cdi_32x40____readonly.pdf

William

Wright

AI and Machine Learning in Clinical Medicine

Poster only

An open-source synergy screening platform accelerates discovery of drug combinations

Wright

William C. Wright, Min Pan, Gregory A. Phelps, Jonathan Low, Duane Currier, Marlon Trotter, Richard E. Lee, Taosheng Chen, Paul Geeleher

William C. Wright, St. Jude Children's Research Hospital - dept of Computational Biology

Min Pan, St. Jude Children's Research Hospital - dept of Computational Biology

Gregory A. Phelps, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Jonathan Low, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Duane Currier, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Marlon Trotter, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Richard E. Lee, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Taosheng Chen, St. Jude Children's Research Hospital - dept of Chemical biology & Therapeutics

Paul Geeleher, St. Jude Children's Research Hospital - dept of Computational Biology

Many diseases require drug combinations for effective treatment, yet the discovery process remains slow and challenging due to technical limitations arising from the exponential nature of testing drugs in combination. Modern liquid handling systems aim to address this by enabling complex and highly customizable experimental designs that were not previously possible. However, throughput remains limited by the lack of comprehensive experimental and analytical tools for maximizing combination screening efficiency. Here we introduce Combocat, an open-source, end-to-end platform for ultrahigh-throughput drug combination screening that integrates acoustic liquid handler protocols and computational models. Using Combocat, we generated a reference dataset of over 290,000 unique drug combination measurements in a dense 10 × 10 matrix format across various drugs and cell types. We leveraged this dataset to build a machine learning model that accurately estimates drug combination effects from sparse matrices, dramatically reducing the experimental measurements required. As proof of concept, we screened 9,045 drug combinations in a neuroblastoma cell line, representing the largest dense combination screen in a single cell line to date, achieved using minimal resources. Overall, our platform leverages advancements in drug plating technologies and machine learning to present a scalable solution for accelerating the discovery of novel drug combinations.

https://github.com/wcwr/PSB_2025/blob/main/Wright_Poster.pdf

Pui Ying

Yew

AI and Machine Learning in Clinical Medicine

Poster only

Identifying Individualized Multiple Chronic Condition Patterns Associated with Dementia

Yew

Pui Ying Yew, Chih-Lin Chi

University of Minnesota Institute for Health Informatics, University of Minnesota Institute for Health Informatics

Studies have shown that people with multiple chronic conditions (MCC) are more susceptible to developing Alzheimer’s Disease and Related Dementia (ADRD). While many have investigated the most common MCC clusters that are associated with a high risk of ADRD, few have investigated the MCC interactions that may contribute to ADRD risk. Knowing which MCC interactions might worsen cognitive decline when coupled with the patient’s existing MCC profiles may help delay or early prevent developing ADRD by preventing these particular MCC and interactions. Specifically, given a patient’s existing MCC profile, we want to understand the differences in future MCC development between patients who developed ADRD and those who did not develop ADRD within 3-, 4-, and 5-year. Here, we denote these future MCC developments as “in-between MCC’ (between now and future potential ADRD diagnosis). To solve the abovementioned problem and predict the in-between MCC, we proposed a method called lazyDT, that utilized lazy learning and cohort-weighting approach to construct a dynamic decision tree tailored to each patient’s existing MCC profiles using other MCC as features. In addition, we also proposed two metrics to evaluate the “predictability” of in-between MCC. We found that lazyDT outperformed the traditional eager learning method (eagerDT) and random decision tree in identifying in-between MCC, especially in true positive match rate. These results suggested that one can learn MCC patterns and interactions from various degrees of similar patients, ranging from exactly the same existing MCC to very different MCC, and such learning is controlled by various sample weights, enabling learning heterogenous, complicated, and interactive MCC patterns. While our study can identify MCC interactions associated with ADRD risk, the causal links between these interactions require further investigation. Our future work includes incorporating syndemic factors into individual profiles and consolidating these MCC interactions in clusters.

https://f1000research.com/posters/13-1452

psb_poster_presentation.pdf

Mike

Zack

AI and Machine Learning in Clinical Medicine

Poster only

Harnessing Advanced ML and AI Models to Accelerate Precision Medicine Recommendations Development

Zack

Mike Zack, Ioan Slobodchikov, David Sokolov, Danil Stupichev, Anastasia Yankovskiy, Allan Gobbs

PGxAI Inc., PGxAI Inc., PGxAI Inc., PGxAI Inc., PGxAI Inc., PGxAI Inc.

Background: Integrating PGx into clinical practice optimizes drug therapy, enhances efficacy, and minimizes ADRs. However, genetic complexities and the labor-intensive development of pharmacogenetic guidelines hinder timely personalized medicine.

Objective: This study aims to accelerate PGx recommendations by developing an ML pipeline that integrates drug molecular structures, physicochemical parameters, PKPD genetic profiles, and allele combinations. Utilizing AI models, we seek to enhance predictive accuracy and provide high-quality, contextually relevant recommendations, facilitating PGx in clinical practice.

Methods: We constructed a comprehensive dataset by integrating knowledge bases from PharmGKB, PubChem, and PharmVAR, encompassing drug fingerprints, chemical properties, and extensive pharmacogenetic information. Addressing data imbalance and complexity, we employed gradient boosting classifiers—CatBoost, LightGBM, and XGBoost—with hyperparameter optimization via Optuna. For recommendation generation, we fine-tuned LLaMA 3.1 models with 8-billion and 70-billion parameters, using structured prompts, Rank-Stabilized Low-Rank Adaptation (LoRA), and statistical generation filtering. Model performance was assessed using BLEU and ROUGE metrics.

Results: The optimized CatBoost classifier achieved an F1-score of 0.9838, precision of 1.0000, recall of 0.9681, and an ROC-AUC of 0.9991, outperforming previous models in distinguishing cases requiring pharmacogenetic recommendations. The fine-tuned LLaMA models generated high-quality recommendations, with the 70-billion parameter model attaining an average BLEU score of 0.8405 and ROUGE-1 score of 0.8695, indicating strong alignment with expert guidelines. These results demonstrate significant improvements over prior studies in both predictive accuracy and recommendation generation.

Conclusion: Our integrated ML and GenAI approach effectively accelerates the development of PGx recommendations, surpassing previous models in accuracy and performance. This method opens avenues for incorporating additional data layers such as clinical outcomes and multi-omics information, and for automating recommendation updates. By facilitating the integration of personalized medicine into clinical practice, this approach has the potential to reduce ADRs, improve therapeutic efficacy, and enhance patient outcomes.

N/A

pgxai_poster_psb_20241130c.pdf

Ashley

Cordes

Earth Friendly Computation

Poster only

Navigating Tension in the Coproduction of Natural Climate Solutions (NCS) Using Soil Carbon Sequestration: Evidence from the First Phase of the Convergence to Accelerate Research on Biological Sequestration (CARBS)

Cordes

Ashley Cordes, Dhruv Modi, Madalynn Madigar

Environmental Studies Program and Department of Data Science, University of Oregon,

Environmental Studies Program, University of Oregon,

Department of English, University of Oregon

This project complicates the trending notion of “co-production” of knowledge in scientific research on Indigenous land by documenting an effort between the Kō-Kwel Nation/Coquille Indian Tribe (CIT) and a National Science Foundation (NSF)-funded carbon sequestration project, Convergence to Accelerate Research on Biological Sequestration (CARBS). There is growing awareness in the scientific community of the necessity for conducting research which respects the authority Indigenous Nations have over their lands, knowledges, and data. However, practices for achieving equitable cooperation between “outside” researchers and Indigenous communities are still being clarified. We contribute to this discourse by showing how Margaret Kovach’s “prickly pragmatics” (2021) framing of Indigenous methodologies (when applied to eDNA data) might be utilized simultaneously as a theoretical lens for understanding potential issues that arise in the course of scientific research, such as the navigation of soliciting community input, as well as a pedagogical and methodological tool. This approach opens creative avenues for scientists to critically reflect upon and actively anticipate the impacts of research on partnered Indigenous communities.

By emphasizing the relationship-building aspects of research and the necessity for following the leadership of Indigenous researchers, our framework requires that scientists actively participate in developing the tools for conducting research that abide by Indigenous cares and concerns regarding land, data, and project final outcomes. Reporting on the CARBS project—which integrates environmental DNA (eDNA) sampling, artificial intelligence (AI) modeling and soil analysis spanning multiple scientific disciplines, labs, and investigators—our investigation underscores that an approach that recognizes and navigates tensions between researchers and Indigenous communities is particularly necessary in interdisciplinary research settings that draw on extremely diverse forms of scientific expertise. While many aspects of collaborations of this kind are necessarily specific and non-transferable (i.e., not universal), this exposition serves as a useful touchpoint for navigating similar research programs.

N/A

psbposterfinal_cordes.pdf

Stephanie

Arteaga

Overcoming health disparities in precision medicine

Poster only

AI to Support Pharmacogenetics in Understudied Populations

Arteaga

Stephanie A. Arteaga, Ethan Tai, Issah Samori, Johnny G. Powell, Alp Tartici, Jan Matthias, Mihir Borkar, Russ B. Altman

Stanford University Department of Biomedical Data Science, Stanford University School of Medicine, Stanford University Department of Bioengineering, Stanford University School of Medicine, Stanford University Department of Genetics, Stanford University Department of Bioengineering, Stanford University Department of Biomedical Data Science, Stanford University Departments of Biomedical Data Science, Bioengineering, and Genetics, and School of Medicine

Regulatory agencies in the United States serve a diverse population while overseeing the safety and efficacy of therapeutics and diagnostics. This task becomes particularly challenging in the context of genetic tests and genetically informed therapeutics, where product performance may depend on individual genetic variations. Historically, most genetic and pharmacogenetic studies have focused on populations of European genetic ancestry, leaving significant gaps in understanding for non-European ancestry groups. Growing literature indicates that genetic findings do not fully transfer across ancestries. Biobanks like All of Us (AoU), the Million Veteran Program (MVP), and the UK Biobank, which have accumulated substantial health, lifestyle, exposure, and genetic information from non-European ancestry populations, provide an opportunity to address these gaps. Notably, AoU and MVP are both designed to include diverse populations and provide sufficient data for large-scale analysis, particularly for African ancestry (AA) and Latino ancestry (LA) populations. AoU offers whole-genome sequencing (WGS) data for 50,969 AA and 47,371 LA participants, while MVP includes WGS data for 21,099 AA and 5,596 LA participants, enabling the discovery of new pharmacogenetic alleles. Advances in artificial intelligence will further characterize these alleles by predicting their phenotypic consequences, helping to close the gap in knowledge about pharmacogenetics in understudied populations.

N/A

psb_conference_poster_2025.pdf

Hee Young

Cho

Overcoming health disparities in precision medicine

Poster only

Innovative Non-Invasive Treatment for Perinatal Depression : A Korean Perspective

Hee Young

Hee Young Cho, Jee Sun Lee, Jaesub Park, Min-Kyoung Kim, Sra Jung, Da Kyung Hong, Hyun-ju Kim

1Department of Obstetrics and Gynecology, Seoul National University College of Medicine,

2Department of Psychiatry, Yongin Severance Hospital, Yonsei University College of Medicine,

3Department of Psychiatry, CHA Ilsan Medical Center, CHA University,

4Department of Obstetrics and Gynecology, CHA Ilsan Medical Center, CHA University,

5Department of Psychiatry, CHA Bundang Medical Center

Objective

To evaluate the effectiveness of tDCS in alleviating depression among perinatal women, providing insights into its potential as a standard treatment in Korean healthcare settings.

Materials and Methods

Twenty-two perinatal women recruited from four different hospitals in Korea underwent tDCS treatment for four weeks. Depression levels were compared before and after the treatment. The participants received tDCS treatment in 30-minute sessions of 2mA each day. The following scales were used to measure outcomes pre- and post-treatment : The Center for Epidemiologic Studies Depression Scale Revised (CESD-R), Beck’s Depression Inventory (BDI-II), Montegomery Asperg Depression Rating Scale (MADRS), Korean version of Edinberg Postnatal Depression Scale (K-EPDS), and Depression Anxiety Stress Scales-21(DASS-21). The DASS-21 were divided into depression, anxiety, and stress subgroups for further analysis.

Results

The mean age of the mothers who participated in the study was 35.05 ± 4.89 years old, and their BMI at the time of participation was 24.80 ± 5.14 kg/m2. The gestational ages at the time of study participation were as follows: 7 pregnant women in the first trimester, 14 in the second trimester, and 1 in the third trimester.

The result showed that significant reductions in depression across multiple scales, including CESD-R(t=3.263, p=.002), K-BDI-II(t=4.566, p=<.001), MADRS(t=3.681, p=<.001) and K-EPDS(t=4.536, p=<.001). DASS subgroup analysis showed statistically significant reduction in depression, (t=4.352, p=<.001), anxiety (t=5.406, p=<.001), and stress (t=4.115, p=<.001).

Conclusion

The results suggest that tDCS is an effective intervention to reduce depression levels and promote mental health in perinatal women.

Acknowledgement

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (RS-2023-KH134964).

https://f1000research.com/posters/13-1371

Tina

Hernandez-Bo

Overcoming health disparities in precision medicine

Poster only

Addressing Biases in Postoperative Delirium Detection: Leveraging Machine Learning and Multimodal EHR Data to Improve Identification and Outcomes

Hernandez-Bo

Yeon Mi Hwang, Malvika Pillai, Catherine Curtin, Tina Hernandez-Boussard

Stanford School of Medicine, Stanford University

Postoperative delirium, affecting up to 30% of patients after major surgeries, is a significant contributor to adverse outcomes such as falls, prolonged hospital stays, and increased mortality. Despite its impact, delirium remains under-recognized, partly due to biases in detection methods and disparities in documentation, especially for underrepresented populations. Reliance on structured EHR data often fails to account for multifaceted social biases intertwined with demographic factors. To reduce biases in the detection of postoperative delirium and improve fairness in predictive models by integrating structured and unstructured EHR data through FairEHR-CLP, a novel framework for fairness-aware clinical predictions. This was a retrospective analysis of 45,822 surgical patients aged 50 and above treated at a major academic medical center between 2012 and 2022. We identified postoperative delirium using both structured data (ICD-10 codes and Confusion Assessment Method assessments) and unstructured clinical notes analyzed with natural language processing (NLP). Using FairEHR-CLP, we applied a two-stage process: first, generating synthetic patient counterparts to simulate diverse demographic identities while preserving essential health information; second, employing contrastive learning to align patient representations across sensitive attributes. Machine learning models assessed risk factors and measured error disparities across subgroups using a novel fairness metric. Postoperative delirium was identified in 17% of patients (7,580 of 45,822), with significant associations between delirium, older age, pre-existing mental health conditions, and specific surgical procedures. Disparities in delirium detection were observed across ancestry groups, with notable variations not fully explained by clinical factors such as anesthesia type or length of stay. FairEHR-CLP demonstrated enhanced fairness and predictive accuracy compared to traditional methods. FairEHR-CLP reduces biases in delirium detection, leveraging multimodal EHR data and fairness-aware techniques to address disparities related to ancestry and demographic factors. This framework represents a significant step toward ensuring equitable and accurate predictive healthcare models.

n/a

fairehr.pdf

Anil

Saini

Overcoming health disparities in precision medicine

Poster only

Optimizing Model Performance and Fairness Through Reweighting Techniques

Saini

Anil K. Saini, Jose Guadalupe Hernandez, Emily F. Wong, Jason H. Moore

All Authors: Department of Computational Biomedicine at Cedars-Sinai Medical Center

Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Biased predictions are especially problematic in healthcare settings, where these predictions can lead to underdiagnosis or missed diagnoses, exacerbating existing disparities in health outcomes. Reweighting is a method that can mitigate bias in model predictions by assigning a weight to each data

point used during model training. Here, we compare three weighting techniques: (1) equal weights for all data points, (2) sample weights computed using only dataset characteristics, (3) sample weights computed using a Genetic Algorithm (GA). We evaluate these approaches on two medical datasets and six publicly available datasets. Our results demonstrate that evolving sample weights through GA approach optimizes both predictive performance (accuracy) and fairness (false negative rate across groups, or demographic parity) better than the alternative sample weights techniques.

https://osf.io/mgy6s/

psb__reweighting_poster.pdf

Lindsay

Fernandez-Rh

Overcoming health disparities in precision medicine

Poster only

Genetic Underpinnings of Coronary Heart Disease in Hispanic/Latino Populations

Yao Tu, Geetha Chittoor, Anne E. Justice, Zhe Wang, Alexandre Pereira, Andrea R.V.R. Horimoto, Elizabeth Frankel, Jennifer E. Below, Kari E. North, Misa Graff, Lindsay Fernandez-Rhodes

Department of Biobehavioral Health, Pennsylvania State University, University Park, Pennsylvania (YT, LFR), Department of Population Health Sciences, Geisinger Health System, Danville, Pennsylvania (GC, AEJ), The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, New York (ZW), Division of Aging, Brigham and Women's Hospital, Boston, Massachusetts (AP, ARVRH), Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee (EF, JEB), Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (MG, KEN)

Hispanic/Latino (HL) populations are highly under-represented in studies of coronary heart disease (CHD) risk factors in many electronic health record (EHR)-based biobanks and cohorts, even though they are the largest ethnic minority in the US. To ensure their representation in genomic studies of CHD, we assembled 144882 HL including 20450 cases and 124432 controls with genome-wide array genotyping or sequencing and CHD phenotype information from ten studies. CHD cases were defined by at least two specific ICD-9/10 conditions or the presence of revascularization or cardiothoracic surgery in EHRs from biobanks, or study-specific indicators in cohort studies. Genome-wide association results were adjusted by sex, age, and principal components (PCs), and filtered (MAF > 0.01, r2 > 0.7). Meta-analysis identified 8 loci that reached genome-wide significance (GWS; p<5e-8), one of which is a novel locus on chromosome 18 (rs75679416). It alters several binding motifs and is located upstream of a long non-coding RNA related to SALL3. The other five loci reported previously include rs55730499 in gene LPA and rs9349379 in gene PHACTR1. These two genes have been associated with CHD in non-Hispanic/Latino populations. Additionally, we observed 114 loci reached suggestively significance (p<5e-6), of which 76 have not been reported previously. Sex-stratified analyses were run for 63750 females (8.9% cases) and ~29,000 males (15.6% cases) separately. Although meta-analysis did not reveal any sex-specific loci at GWS, 44 loci in females and 37 loci in males reached suggestive significance. A formal test of sex differences revealed that 2 loci (rs116506475 and rs116120189) reached suggestively significant interactions. We are adding additional samples from Latin America and the US, seeking independent replication, and applying fine mapping and conditional analyses to further inform the etiology of CHD in Hispanic/Latino adults. Future work in this vein will bolster cardiovascular health equity and HL community-based interventions.

N/A

ashg_poster_2024_yao_revised_36h_48w.pdf

Jason

Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease

Poster only

Integrated proteogenomic characterization of glioblastoma evolution

Jason K. Sa

Department of Biomedical Informatics, Korea University College of Medicine, Department of Biomedical Sciences, Korea University College of Medicine,

The evolutionary trajectory of glioblastoma (GBM) is a multifaceted biological process that extends beyond genetic alterations alone. Here, we perform an integrative proteogenomic analysis of 123 longitudinal glioblastoma pairs and identify a highly proliferative cellular state at diagnosis and replacement by activation of neuronal transition and synaptogenic pathways in recurrent tumors. Proteomic and phosphoproteomic analyses reveal that the molecular transition to neuronal state at recurrence is marked by post-translational activation of the wingless-related integration site (WNT)/ planar cell polarity (PCP) signaling pathway and BRAF protein kinase. Consistently, multi-omic analysis of patient-derived xenograft (PDX) models mirror similar patterns of evolutionary trajectory. Inhibition of B-raf proto-oncogene (BRAF) kinase impairs both neuronal transition and migration capability of recurrent tumor cells, phenotypic hallmarks of post-therapy progression. Combinatorial treatment of temozolomide (TMZ) with BRAF inhibitor, vemurafenib, significantly extends the survival of PDX models. This study provides comprehensive insights into the biological mechanisms of glioblastoma evolution and treatment resistance, highlighting promising therapeutic strategies for clinical intervention.

N/A

Aurora Anna F

Colombo

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with oral presentation

Enhancing Privacy-Preserving Cancer Classification with Convolutional Neural Networks

Colombo

Aurora A. F. Colombo, Luca Colombo, Alessandro Falcetta, Manuel Roveri

Politecnico di Milano, Politecnico di Milano, Politecnico di Milano, Politecnico di Milano, Politecnico di Milano

Precision medicine significantly enhances patients prognosis, offering personalized treatments. Particularly for metastatic cancer, incorporating primary tumor location into the diagnostic process greatly improves survival rates. However, traditional methods rely on human expertise, requiring substantial time and financial resources. To address this challenge, Machine Learning (ML) and Deep Learning (DL) have proven particularly effective. Yet, their application to medical data, especially genomic data, must consider and encompass privacy due to the highly sensitive nature of data. In this paper, we propose OGHE, a convolutional neural network-based approach for privacy-preserving cancer classification designed to exploit spatial patterns in genomic data, while maintaining confidentiality by means of Homomorphic Encryption (HE). This encryption scheme allows the processing directly on encrypted data, guaranteeing its confidentiality during the entire computation. The design of OGHE is specific for privacy-preserving applications, taking into account HE limitations from the outset, and introducing an efficient packing mechanism to minimize the computational overhead introduced by HE. Additionally, OGHE relies on a novel feature selection method, VarScout, designed to extract the most significant features through clustering and occurrence analysis, while preserving inherent spatial patterns. Coupled with VarScout, OGHE has been compared with existing privacy-preserving solutions for encrypted cancer classification on the iDash 2020 dataset, demonstrating their effectiveness in providing accurate

privacy-preserving cancer classification, and reducing latency thanks to our packing mechanism. The code is released to the scientific community.

N/A

poster_psb.pdf

Jakob

Woerner

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Accepted proceedings paper with oral presentation

Plasma protein-based and polygenic risk scores serve complementary roles in predicting inflammatory bowel disease

Woerner

Jakob Woerner, Thomas Westbrook, Seokho Jeong, Manu Shivakumar, Allison R. Greenplate, Sokratis A. Apostolidis, Seunggeun Lee, Yonghyun Nam, Dokyoon Kim

University of Pennsylvania, University of Pennsylvania, Seoul National University, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, Seoul National University, University of Pennsylvania, University of Pennsylvania

Inflammatory bowel disease (IBD), encompassing Crohn’s disease (CD) and ulcerative colitis (UC), has a significant genetic component and is increasingly prevalent due to environmental factors. Current polygenic risk scores (PRS) have limited predictive power and cannot inform time of symptom onset. Circulating proteomics profiling offers a novel, non-invasive approach for understanding the inflammatory state of complex diseases, enabling the creation of proteomic risk scores (ProRS). This study utilizes data from 51,772 individuals in the UK Biobank to evaluate the unique and combined contributions of PRS and ProRS to IBD risk prediction. We developed ProRS models for CD and UC, assessed their predictive performance over time, and examined the benefits of integrating PRS and ProRS for enhanced risk stratification. Our findings are the first to demonstrate that combining genetic and proteomic data improves IBD incidence prediction, with ProRS providing time-sensitive predictions and PRS offering additional long-term predictive value. We also show that the ProRS achieves better predictive performance among individuals with high PRS. This integrated approach highlights the potential for multi-omic data in precision medicine for IBD.

N/A

Craig

Teerlink

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

Genome-wide burden analysis reveals novel genetic determinants of survival in patients with metastatic prostate cancer receiving androgen-targeted therapies: Results from the Million Veterans Program

Candelieri-Sur

Danielle Candelieri-Surette, Dan Berlowitz, Tyler Nelson, Hannah Carter,Brent Rose, Rana McKay, Jason Vassy, Kathryn M. Pridgen, Richard Hauger, Julie A Lynch, Craig C Teerlink

VA Informatics and Computing Infrastructure (VINCI) VA Salt Lake City Health Care System Salt Lake City UT USA, University of Massachusetts Lowell Lowell MA,VA Informatics and Computing Infrastructure (VINCI) VA Salt Lake City Health Care System Salt Lake City UT USA,University of California San Diego La Jolla CA, University of California San Diego La Jolla CA, University of California San Diego La Jolla CA, VA Boston Healthcare System Boston Massachusetts USA,VA Informatics and Computing Infrastructure (VINCI) VA Salt Lake City Health Care System Salt Lake City UT USA, VA San Diego Healthcare System San Diego CA USA,VA Informatics and Computing Infrastructure (VINCI) VA Salt Lake City Health Care System Salt Lake City UT USA, VA Informatics and Computing Infrastructure (VINCI) VA Salt Lake City Health Care System Salt Lake City UT USA

Background: Literature suggests various genes influence treatment response in prostate cancer. Studies increasingly highlight the roles of genes in pharmacogenetics. Genes allowing androgens to enter cells more efficiently may increase therapy resistance, worsening outcomes in recurrent or metastatic disease. A deeper understanding of biomarkers and resistance mechanisms is essential to address this process. We conducted a genome-wide search for genes that exhibit an effect on survival in patients on androgen receptor-targeted therapies.

Methods:

We developed two models: Model 1 identified 5,107 patients with metastatic prostate cancer receiving androgen deprivation therapy while model 2 focused on a subgroup 2,240 patients on abiraterone. We performed 18,544 distinct gene-level tests across the human genome using all coding variants within each gene, analyzing rare (minor allele frequency < 1%) loss-of-function variants, and both rare loss-of-function and missense variants. All analyses utilized the Veterans Administration’s Million Veterans Program (MVP) whole genome Release2 genomic data (96K males). Gene-level survival analysis was performed using the SeqMeta R package that performs burden, Skat, and Skat-O tests. A Bonferroni correction was applied for 82,567 tests, requiring p<6.06E-7 to establish significance and p<5.00E-6 for suggestive evidence.

Results:

Gene-level tests identified OR1G1 (burden p=1.87E-7, OR=0.17), PCDHGB1 (burden p= 2.92E-07, OR=0.61), and CLPB (burden p=3.34E-07, OR=0.30) as significantly associated with prostate cancer survival, while SNX18 demonstrated a suggestive association (Skat-O p=3.40E-6) among European ancestry subjects in model 1. Among African ancestry subjects, the best performing gene was UBD (burden p= 2.61E-06, OR=0.76). Analyses of rare loss-of-function and loss-of-function or missense variants failed to achieve significant or suggestive evidence. Model 2 did not reveal any significant findings.

Conclusions: Our results detected novel associations with genes involved in cell signaling and intracellular protein trafficking that suggest new druggable targets for prostate cancer. Follow-up in external datasets for validation is warranted.

https://f1000research.com/posters/13-1433

pca_burden_survival_poster_psb_241125.pdf

Zinhle

Cindi

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

Imputation from low-pass whole genome sequencing data with GLIMPSE2 versus TOPMed

Cindi

Zinhle Cindi, Yuki Bradford, Phumla Sinxadi, David W. Haas, Marylyn D. Ritchie

Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA,

Division of Clinical Pharmacology, Department of Medicine, University of Cape Town, Cape Town, South Africa, SAMRC/UCT Platform for Pharmacogenomics Research and Translation, South African Medical Research Council, Cape Town, South Africa,

Vanderbilt University Medical Center, Nashville, TN 37232, USA,

Meharry Medical College, Nashville, TN 37208, USA,

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA

Background: DNA sequencing technologies are emerging as alternatives to genotyping

arrays. Sequencing enables comprehensive probing of the entire genome with a specified

depth of coverage. Whole-genome sequencing (WGS) performed with a read depth of 30x to

50x is presently cost-prohibitive for most large-scale studies. Low-pass whole-genome

sequencing (lp-WGS) has emerged as a cost-effective alternative to genotyping arrays and

high depth WGS. lp-WGS typically involves a read depth of 0.1x to 1x, with subsequent

imputation. Imputation from genotyping arrays has been widely used in large-scale genomewide

association studies (GWAS), using either 1000 Genomes or TOPMed reference panels.

GLIMPSE2 was specifically designed to impute from lp-WGS data. It is unclear whether

standard GWAS imputation approaches could be applied to lp-WGS data, or whether other

pipelines such as GLIMPSE2 are required. The present methods analyses compared genomic

coverage and data quality imputed from lp-WGS using GLIMPSE2 versus TOPMed.

Methods: Using a test set of lp-WGS data generated from five DNA samples, we performed

imputation using both GLIMPSE2 and TOPMed. Unfiltered data were submitted to imputation

pipelines. Concordance between output variants was assessed. We also checked for

concordance between pre-imputation variant calls with GLIMPSE2 versus TOPMed.

Results: There was 96.0% concordance among variants shared by datasets imputed by both

GLIMPSE2 and TOPMed. GLIMPSE2 generated 63,824,182 variant calls while TOPMed

generated 8,439,092 variant calls. Concordance between pre-imputation and imputed variants

was 88.3% with GLIMPSE2 (n=11,950,240 non-missing calls) and 85.1% with TOPMed

(n=6,379,129 non-missing calls).

Conclusions: Imputation of lp-WGS data with GLIMPSE2 yielded 7.6-times more variants

than with TOPMed, indicating that TOPMed is not suitable for lp-WGS data. There was good

agreement between variants generated by the two methods. It will be important to compare

lp-WGS variant calls imputed with GLIMPSE2 versus variant calls generated using other

standard methods.

N/A

Zinhle Cindi_Low-pass WGS poster.pdf

Qian-Quan

Sun

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

What have single-cell and spatial transcriptomics revealed about the plasticity of the medial prefrontal cortex in development, aging, and Alzheimer's disease?

Sun

Qian-Quan Sun*, Chunzhao Zhang, Yihan Wang, Tianyu (Chase) Cao, Derek Walton, Maycie Schultz, Madeline Bershinsky and Robby Johnston

Department of Zoology and Physiology, University of Wyoming; Wyoming Sensory Biology Center.

The medial prefrontal cortex (mPFC) is crucial for executive functions such as decision-making, initiation, and inhibition, acting as a hub for integrating various cognitive processes. Dysfunctions in the mPFC, particularly in its dorsal (dmPFC) and ventral (vmPFC) subdivisions, significantly contribute to a range of disorders, including autism spectrum disorder (ASD), obsessive-compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), schizophrenia, and Alzheimer’s disease (AD), all of which impair executive functions. To investigate the molecular and circuit organization of the mPFC, we employed single-cell and spatially resolved transcriptome profiling in the mouse PFC across two critical age groups, using both wild-type and 5xFAD mouse models. Additionally, we collected data during goal-directed behaviors guided by positive and negative valence. Our results revealed a hierarchical mapping of cell types aligned with the Allen Brain Cell Atlas, highlighting significant regional differences between the dmPFC and vmPFC across various cell types, age groups, and stages of disease, particularly Alzheimer's. This underscores the specificity of key mPFC functions and dysfunctions in neurodevelopment and neurodegeneration. We found that projection neurons, such as those projecting to the motor cortex, which contain mixed classes of glutamatergic neurons, can be defined at the transcriptomic level. Network analysis of molecularly defined cell types in the PFC demonstrated distinct cellular communications and signaling networks associated with varying aging stages and pathological conditions, particularly early versus late AD. Overall, these findings indicate that the hierarchical molecular organization of the mPFC is finely tuned by brain states and disease progression, shedding light on the specificity of key mPFC functions and dysfunctions during neurodevelopment and neurodegeneration. Collectively, dynamic and comprehensive transcriptomic maps will facilitate the exploration of distinct molecular, cellular, and circuit mechanisms underlying specific mPFC functions in both health and disease.

N/A

psbabstract.pdf

Jude

Wells

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

Protein structure representations and kernel methods for predicting variant pathogenicity and mutation induced drug-resistance

Wells

Jude Wells, Brooks Paige

University College London, University College London

We investigate electron density-based representations of protein structures for predicting variant pathogenicity and mutation-induced drug resistance. Electron density offers a well-defined distance function between structural observations even when the two structures contain different atom types. This provides a convenient metric for measuring changes in structure induced by mutation and facilitates the application of predictive kernel methods like Gaussian processes. These methods enable robust prediction in limited data regimes while naturally providing uncertainty estimation. We test our method on ClinVar (predicting disease-causing mutations) and AB-Bind (predicting the change in antibody-antigen binding affinity from mutation). In both cases, we generate structural predictions for the mutated protein using all-atom diffusion models, convert the predicted structure to an electron density and use this to fit a Gaussian process. Our preliminary results, though far from state-of-the-art, show some promise on these challenging tasks and provide some insights into regions of the protein where mutations are likely to be particularly deleterious.

N/A

electron_density_keynote_v1_med_q.pdf

Keyan

Zhao

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

Interplay of tumor-intrinsic genomic features and immune microenvironment modulates clinical responses in DLBCL

Zhao

Keyan Kevin Zhao1, Paul Jung2, Tabrez A Mohammad1, Kwangbom Choi1, Tolga Turan1, Leo Wang-Kit Cheung1

1 AbbVie Genomics Research Center, Oncology Bioinformatics, 1000 Gateway Boulevard, South San Francisco, CA 94080,

2 AbbVie Genomics Research Center, Oncology Bioinformatics, 1 N Waukegan Road, North Chicago, IL 60064

Diffuse large B-cell lymphoma (DLBCL) is the most common type of aggressive non-Hodgkin lymphoma. Standard-of-care immunochemotherapy regimens, such as R-CHOP, cure only up to 60% of patients in the frontline setting, which presents significant unmet medical needs. The strong genetic and phenotypic heterogeneity in DLBCL presents a challenge in the development of novel treatments and drugs. Although recent efforts in the genetic and molecular classification of DLBCL, such as cell of origin (COO) and LymphGen genetic subtypes, have greatly advanced our understanding of the disease biology, there are limited efforts integrating multidimensional biomarker data, such as tumor mutations, gene expression profiling, and tumor microenvironment (TME) features. Leveraging recently published data from single-cell transcriptomics in NHL could further enhance genomic subtyping in DLBCL.

In this study, we explored the shared genomic features and immune landscape across multiple large cohorts of DLBCL patients. We developed a novel approach integrating the interplay of genomic lesions, gene expression profiles, and TME to identify biomarker signatures of clinical responses in DLBCL. We incorporated cellular interactions in the TME and cytokine-related cross-talks into the model. We identified distinct cell-cell interactions within different COO subpopulations and clarified the prognostic impact of various infiltrating immune cells in patients receiving standard-of-care immunochemotherapy. Our approach facilitated the development of prognostic models to identify more precise subgroups of high- and low-risk patients in different cellular backgrounds than other established molecular signatures. It also aided the development of precision medicine approaches for selecting patients who might benefit from other immunotherapies or combination therapies (e.g., the combination of BTK/BCL2 inhibitors and immunochemotherapies).

All authors are employees of AbbVie. The design, study conduct, and financial support for this research were provided by AbbVie. AbbVie participated in the interpretation of data, review, and approval of the publication. No honoraria or payments were made for authorship.

N/A

keyanzhao_psb2025_abstract.pdf

Weihao

Zhao

Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

Poster only

PathwayGAT: an explainable GNN framework integrating multi-layer information to reveal novel insights of complex biological processes

Zhao

Weihao Zhao, Shaoke Lou, Mark Gerstein

Yale University Program in Computational Biology & Bioinformatics, Yale University Department of Molecular Biophysics & Biochemistry

Yale University Program in Computational Biology & Bioinformatics, Yale University Department of Molecular Biophysics & Biochemistry

Yale University Program in Computational Biology & Bioinformatics, Yale University Department of Molecular Biophysics & Biochemistry

Graph Neural Networks (GNNs) have emerged as a powerful tool for biological research, particularly in modeling interactions between genes in various biological processes. However, there is still a significant gap in studies integrating multi-layered biological information beyond genes, such as microbes and SNPs into a GNN framework. This integration is crucial, as research has shown that these components could play critical roles in biological processes. Ignoring them can lead to an incomplete understanding of the underlying mechanisms driving complex biological phenomena. To address this gap, we developed an approach that constructs a comprehensive network encompassing genes, microbes, SNPs, potentially including other components, organized into interconnected pathways. After training our GNN-based model on this network with multiple datasets, we obtained satisfying results on different tasks. Besides, our framework, PathwayGAT, is also explainable. Beyond its success in various predictive and analytical tasks, PathwayGAT’s explainable framework enables a detailed examination of each feature within the network, shedding light on their individual and collective contributions to the biological process under study. This feature not only enhances the model's utility but also makes it a valuable resource for researchers aiming to investigate complex biological systems further. PathwayGAT provides a robust platform for revealing mechanisms underlying biological processes, paving the way for novel insights and discoveries in computational biology.

https://f1000research.com/posters/13-1419

psb_poster_weihao_zhao.pdf

Lioba

Berndt

Translating Big Data Imaging Genomics Findings to the Individual

Poster only

Sleep Detectives: Thalamo-Cortical Modelling of Sleep in Children with 22q11.2 Deletion Syndrome

Berndt

Lioba CS Berndt, Nicholas A Donnelly, Ullrich Bartsch, Hayley A Moulding, Christopher Eaton, Hugh Marston, Meg Attwood, Abiola Saka, Christopher Jarrold, Julie Clayton, Jessica H Hall, Jeremy Hall, Michael J Owen, Marianne BM van den Bree, Matt W Jones, Alex D Shaw

Department of Psychology Faculty of Health & Life Sciences University of Exeter UK, Centre for Academic Mental Health University of Bristol Bristol UK, School of Physiology Pharmacology and Neuroscience University of Bristol Bristol UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, Translational Neuroscience Eli Lilly Windlesham United States, School of Psychological Science Faculty of Health & Life Sciences University of Bristol UK, School of Physiology Pharmacology & Neuroscience Faculty of Health & Life Sciences University of Bristol UK, School of Psychological Science Faculty of Health & Life Sciences University of Bristol UK, School of Population Health Sciences Bristol Medical School Faculty of Health & Life Sciences University of Bristol UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, Medical Research Council Centre for Neuropsychiatric Genetics and Genomics Cardiff University Cardiff UK, School of Physiology Pharmacology & Neuroscience Faculty of Health & Life Sciences University of Bristol UK

22q11.2 Deletion Syndrome (22q11.2DS) is associated with a higher risk of psychiatric disorders, including schizophrenia, and cognitive impairments. Sleep disturbances, such as insomnia and fragmented sleep, are common in children with 22q11.2DS and are linked to neurodevelopmental symptoms like ADHD and anxiety.

As part of the Sleep Detectives project, which aims to identify early markers of psychiatric risk and guide potential interventions through the integration of sleep behavior, EEG, cognition, and biomarkers in children with copy number variants (CNVs), this study focuses on the computational modeling of sleep EEG data.

We applied a thalamo-cortical model to EEG data from children with 22q11.2DS and their siblings, comparing synaptic model parameters between the groups. Two key synaptic parameters significantly differed: GABA A receptor conductance from deep interneurons (di) to deep pyramidal cells (dp) in layer 5 during N3 sleep, and AMPA receptor conductance from spiny stellate cells (ss) in layer 4 to the thalamic reticular population (rl) during REM sleep.

Simulations adjusting these significant parameters resulted in the power spectral density (PSD) of the 22q11.2DS group shifting toward that of their siblings, suggesting that modifying these parameters could help normalize sleep EEG patterns. We also found correlations between these synaptic parameters and neurodevelopmental symptoms, including ADHD and autism spectrum disorder (ASD) traits.

This study advances our understanding of sleep disruptions in 22q11.2DS, offering insights into synaptic targets that may guide future interventions, such as potential drug targets, aimed at improving both sleep and cognitive outcomes.

N/A

lb_poster_psb2025.pdf

John

Van Horn

Translating Big Data Imaging Genomics Findings to the Individual

Poster only

Protein coding gene MYO1E drives significant differences in white matter microstructure between autistic individuals and controls

Seng

Gabriel R. Seng

Ian P. Adoremos

Zachary J. Jacokes

Benjamin T. Newman

Kevin A. Pelphrey

John Darrell Van Horn

for the GENDAAR Research Consortium

Langley High School, McLean, VA 22101

National Institute of Mental Health, Bethesda, Maryland 20892-3720

UVA School of Data Science, University of Virginia, Charlottesville, VA 22903

Department of Psychology, University of Virginia, Charlottesville, VA 22903

UVA School of Medicine, University of Virginia, Charlottesville, VA 22903

Identifying genes that drive white matter microstructural deficits in individuals with Autism Spectrum Disorder (ASD) remains critical for uncovering the biological mechanisms that contribute to the condition. White matter, composed of myelinated axons, plays a crucial role in brain connectivity and communication, and disruptions in its structure are often linked to cognitive and behavioral challenges in ASD. This study has uncovered genetic factors driving white matter microstructural defects present in children with ASD. Therefore, this study relies on new software, Connectographics, to visualize white matter-based intracellular microstructural networks between boys and girls with Autism Spectrum Disorder (ASD) and controls. This study acquired neuroimaging data using Siemens 3T Magnetom TrioTim and PRISMA scanners following standardized T1, EPI, and DWI acquisition protocols. Image preprocessing was performed using FSL and FreeSurfer 7.0, aligned with the Destrieux atlas. After connectograms were generated, genes that substantially influence case-control differences in white matter microstructure were identified. To identify these genes, a partial-least squares regression analysis (PLS) was conducted, and regional expression values of 15,329 genes were extracted from the Allen Human Brain Atlas and defined as dependent variables. Results from this partial-least squares regression analysis identified the gene MYO1E as a significant factor that drives case-control differences between ASD and neurotypical individuals. The protein-coding gene MYO1E, has been considered an important candidate gene associated with Autism risk (SFARI Gene Score: 2) due to its enrichment and high intolerance to rare variants associated with Autism. Results, therefore, conclude that boys and girls with ASD harbor significant differences in white matter microstructural networks, and the protein-coding gene MYO1E is a substantial factor driving these differences.

https://f1000research.com/posters/13-1422

abstract_for_pacific_symposium_on_biocmputing__final_

Jie

Hou

Workshop: Leveraging Foundational Models in Computational Biology: Validation, Understanding, and Innovation

Poster only

Enhancing RNA Motif Representation with Contrastive Learning and Language Models for Sequence-Structure Analysis

Chaudhari

Vinay Chaudhari, Sharear Saon, Grace Fu, Brent Znosko, Jie Hou

Department of Computer Science, Saint Louis University, Saint Louis, 63103, Missouri, United States,

Department of Chemistry, Saint Louis University, Saint Louis, 63103, Missouri, United States,

Parkway South High School, 801 Hanna Rd, Manchester, 63021, Missouri, United States,

Department of Chemistry, Saint Louis University, Saint Louis, 63103, Missouri, United States,

Department of Computer Science, Saint Louis University, Saint Louis, 63103, Missouri, United States,

RNA motifs, such as stems, loops, and bulges, play a pivotal role in RNA folding and structural organization, directly influencing their biological functions in gene expression regulation, protein synthesis, and molecular interactions. Despite the availability of extensive RNA sequence data, the limited number of experimentally resolved RNA structures restricts the development of computational approaches for accurately predicting RNA 3D structures. Traditional motif-based methods, which assemble tertiary structures using libraries of known structural motifs, struggle to generalize for RNA molecules with unrepresented motifs in these templates. To overcome this challenge, we propose a novel framework centered on RNA motifs, integrating machine learning techniques to enhance motif-level sequence and structure representation.

Our approach begins with a curated dataset of RNA structural motifs from the RNA CoSSMos database, encompassing diverse motif types such as symmetric loops, hairpins, and bulges. For sequence learning, we leverage the RNA-FM language model, fine-tuned on motif sequences to generate motif-specific embeddings. For structure learning, RNA motifs are encoded using a 3D ResNet model trained on spatial representations of motif structures through multi-class classification and contrastive learning. To further enhance understanding, we integrate sequence and structure embeddings using a contrastive learning paradigm to align the two representations in a shared latent space.

By focusing on RNA motifs, our framework delivers improved clustering and classification of motif types, revealing the intricate sequence-structure relationships that underpin RNA folding. This work underscores the importance of RNA motifs in structural analysis and demonstrates the potential of integrating motif-focused embeddings with AI-driven approaches to enhance RNA structural studies. The results will also be generalized to study the significance of motif-level features in advancing RNA structure prediction and provide a foundation for understanding RNA motifs' roles in biological processes.

https://cs.slu.edu/~hou/PSB/Poster_PSB.pdf

Aaron

Fanous

Workshop: Opportunities and Pitfalls with Large Language Models for Biomedical Annotation

Accepted workshop abstract

Automating Variant Drug Annotations with Schema Based Constraints

Fanous

Aaron Fanous, Zara Ansari , Sonnet Xu,Vasiliki Bikia,Caroline Thorn, Mark Woon, Li Gong, Katrin Sankhul , Michelle Whirl-Carillo Roxana Daneshjou, Teri Klein

Stanford Department of Biomedical Data Science,

Stanford Department of Biomedical Data Science,

Stanford Department of Computer Science,

Stanford Department of Biomedical Data Science, HAI

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Stanford Department of Biomedical Data Science, Stanford Department of Dermatology

Stanford Department of Biomedical Data Science, Stanford Department of Genetics

Variant annotation in genetics is a complex and time-intensive process, often requiring the expertise of highly skilled professionals. The volume of data makes it challenging to respond effectively in a timely manner. This study evaluates schema-constrained Generative Pre-trained Transformer 4 (GPT-4o) for automating variant-drug data extraction, assessing its potential to streamline pharmacogenomic annotation while addressing key limitations.

Using 5,000 variant-drug annotations from Pharmacogenomics Knowledge Base (PharmGKB), we processed 1,850 unique variant annotations from 657 articles. GPT-4o was prompted to extract structured data, including variant identifiers (e.g., reference single nucleotide polymorphism (SNP) cluster IDs (rsIDs), genes, protein changes), associated drugs, phenotypes, and significance, into a JavaScript Object Notation (JSON) schema. Extraction quality was evaluated against a ground truth dataset, focusing on exact and partial matches for genes, drugs, and variants.

Results showed mixed performance. GPT-4o demonstrated moderate precision for gene names (41%), drugs (57.4%), phenotypes (66.7%), and significance (56.8%), but failed for exact variant matches (0%). Grouping by shared PubMed Identifier numbers (PMIDs) improved performance, with gene match rates rising to 74.8%, drug matches to 72.6%, and variant matches to 37.6%. Despite capturing aggregate information, the model struggled with alignment, precision, and complex associations, underscoring the need for better validation.

Schema-constrained GPT-4o shows promise in reducing manual workload for pharmacogenomic annotation by providing consistent output for routine tasks but struggles with exact matches and managing complex associations, affecting accuracy in complex cases. Limitations like inaccuracies, confabulation, and alignment issues indicate the need for refinement. Future work should leverage validation frameworks, specialized databases, agent based systems and quality assessment protocols to improve precision and reliability.

These findings underscore both the opportunities and challenges of using large language models in pharmacogenomics, highlighting the need for iterative development and caution. Addressing these limitations responsibly will help harness AI-driven tools to advance precision medicine.

N/A

aaron_fanous_pbs.pdf

Saurav K

Aryal

General

Poster only

Un-learning Large Language Models: Challenges and Future Directions in Biomedical Privacy

Aryal

Saurav K. Aryal

Howard University

Large Language Models (LLMs) in biomedical use-cases present significant privacy challenges, particularly as training on sensitive health data may inadvertently retain user-specific information, violating data protection regulations like GDPR. This issue is compounded by the difficulty of ensuring that private data, once integrated into LLMs, can be effectively and verifiably removed. Conventional retraining approaches are resource-intensive and impractical at scale, highlighting the need for efficient "machine unlearning" methods.

Unlearning addresses privacy concerns by systematically removing the influence of specific data samples from trained models. Approaches include exact unlearning, such as the SISA framework, which partitions datasets for selective retraining (Bourtoule et al., 2021), and approximate methods, like certified data removal, which mitigate the computational burden through influence function estimation (Guo et al., 2020). Federated unlearning extends these principles to decentralized settings, addressing unique challenges in data distribution and access (Liu et al., 2021).

However, current unlearning techniques encounter limitations in data-scarce environments and for multilingual biomedical applications, where underrepresented languages lack annotated corpora. Addressing these gaps necessitates innovative strategies. Bayesian unlearning, for example, employs probabilistic models to approximate posterior distributions, offering resilience in low-resource contexts (Nguyen et al., 2020). Federated learning paradigms can also enhance data utilization across diverse linguistic groups by leveraging cross-model knowledge distillation.

Future improvements should focus on adaptive unlearning mechanisms that integrate multilingual embeddings and synthetic data generation to bolster performance where data is limited. Additionally, incorporating verification metrics, such as membership inference auditing and privacy leakage assessment, will ensure robust privacy guarantees across diverse biomedical domains. By advancing unlearning methodologies, LLMs can better navigate the intersection of privacy, efficiency, and inclusivity in biomedical applications.

N/A

psb_abstract.pdf

Cosmin

Bejan

General

Poster only

DrugWAS X PheWAS: A high-throughput approach for analyzing how drugs taken during pregnancy affect pediatric outcomes

Bejan

Cosmin A. Bejan, Amelie Pham, Leena Choi, Sarah Osmundson, S. Trent Rosenbloom, Elizabeth J. Phillips

Vanderbilt University Medical Center

Regulatory barriers regarding the participation of pregnant women in drug trials have contributed to the lack of knowledge about the risks, safety, and teratogenic effects of these drugs. Here, we describe a drug-wide X phenome-wide association study (DrugWAS X PheWAS) framework for investigating associations between maternal drug exposures and pediatric outcomes in Electronic Health Record (EHR). Further, we evaluated machine learning methods for accurate gestational age (GA) estimation at birth using EHR data. The study participants consisted of all mother-child dyads from the Vanderbilt University Medical Center (VUMC) Research Derivative (RD), a repository of identified EHRs of >5 million patients. For each mother-child dyad, we extracted the drug exposures of the mother during pregnancy and health outcomes of the child. We developed a gold dataset of GA expressions available in the RD to train three machine learning models for GA estimation: linear regression, random forest, and gradient boosting. The features used for training these models included maternal age at delivery, maternal race and ethnicity, infant sex, and International Classification of Diseases (ICD) codes for preterm, term, and post-term deliveries. Preliminary analysis revealed that 86% of mothers were prescribed at least one drug during pregnancy, with 78% receiving two or more prescriptions. Frequent drugs used during pregnancy include ondansetron, promethazine, folic acid, docusate, famotidine, and nitrofurantoin. Most frequent outcomes in children up to 1-year old include neonatal jaundice, atrial septal defect, patent ductus arteriosus, cardiac murmur, and septicemia of newborn. The best model for GA estimation was gradient boosting with a mean squared error of 207.53. Future work includes improving the extraction of drug exposures and pediatric outcomes from EHR using natural language processing methods.

http://adi.bejan.ro/papers/2025_PSB_Bejan_etal_DrugWASPheWAS_poster.pdf

Davide

Buzzao

General

Poster only

FunCoup 6: advancing functional association networks across species with directed links and improved user experience

Buzzao

Davide Buzzao, Emma Persson, Dimitri Guala, Erik L.L. Sonnhammer

Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden

FunCoup 6 (https://funcoup6.scilifelab.se/) represents a significant advancement in global functional association networks, aiming to provide researchers with a comprehensive view of the functional coupling interactome. This update introduces novel methodologies and integrated tools for improved network inference and analysis. Major new developments in FunCoup 6 include vastly expanding the coverage of gene regulatory links, a new framework for bin-free Bayesian training, and a new website. FunCoup 6 integrates a new tool for disease and drug target module identification using the TOPAS algorithm. To expand the utility of the resource for biomedical research, it incorporates pathway enrichment analysis using the ANUBIX and EASE algorithms. The unique comparative interactomics analysis in FunCoup provides insights of network conservation, now allowing users to align orthologs only or query each species network independently. Bin-free training was applied to 23 primary species, and in addition networks were generated for all remaining 618 species in InParanoiDB 9. Accompanying these advancements, FunCoup 6 features a new redesigned website, together with updated API functionalities, and represents a pivotal step forward in functional genomics research, offering unique capabilities for exploring the complex landscape of protein interactions.

https://doi.org/10.7490/f1000research.1119861.1

abstract_psb2025.pdf

Andres

Cardenas

General

Poster only

Machine Learning for the Development of Methylation Risk Score Predictors

Cardenas

Andres Cardenas, Dennis Khodasevich, Nina Holland, Lars van der Laan

Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA,

Center for Environmental Research and Community Health (CERCH), Berkeley Public Health, University of California, Berkeley, Berkeley, CA, USA,

Department of Statistics, University of Washington, Seattle, WA, USA

Background: DNA methylation (DNAm) provides a window to characterize the impacts of environmental exposures and the biological aging process. Epigenetic clocks are often trained on DNAm using penalized regression of CpG sites, but recent evidence suggests potential benefits of training epigenetic predictors on principal components.

Methodology/Findings: We developed a pipeline to simultaneously train three epigenetic predictors; a traditional CpG Clock, a PCA Clock, and a SuperLearner PCA Clock (SL PCA). We gathered publicly available DNAm datasets to generate i) a novel childhood epigenetic clock, ii) a reconstructed Hannum adult blood clock, and iii) as a proof of concept, a predictor of polybrominated biphenyl exposure using the three developmental methodologies. We used correlation coefficients and median absolute error to assess fit between predicted and observed measures, as well as agreement between duplicates. The SL PCA clocks improved fit with observed phenotypes relative to the PCA clocks or CpG clocks across several datasets. We found evidence for higher agreement between duplicate samples run on alternate DNAm arrays when using SL PCA clocks relative to traditional methods. Analyses examining associations between relevant exposures and epigenetic age acceleration (EAA) produced more precise effect estimates when using predictions derived from SL PCA clocks.

Conclusions: We introduce a novel method for the development of DNAm-based predictors that combines the improved reliability conferred by training on principal components with advanced ensemble-based machine learning. Coupling SuperLearner with PCA in the predictor development process may be especially relevant for studies with longitudinal designs utilizing multiple array types, as well as for the development of predictors of more complex phenotypic traits.

N/A

poster.pdf

Otakuye

Conroy-Ben

General

Poster only

Wastewater-Based Epidemiology in Tribal Communities to Assess Pathogens

Conroy-Ben

Otakuye Conroy-Ben, Thuy Nguyen, Naeema Cheshomi

School of Sustainable Engineering and the Built Environment, Arizona State University

Wastewater-based epidemiology measures biomarkers found in excreted human waste (urine and feces) to interpret the health of the community from which sewage was sampled. WBE was critical in determining SARS-Co-V-2 infections during the COVID-19 pandemic. The overall objective of this project was to assess the wastewater microbiome for bacteria, archaea, viruses, and eukaryotic microorganisms using genomic sequencing techniques. Wastewater samples were collected from two Tribal communities, which were then filtered through a 0.45 um filter or concentrated with magnetic nanoparticles, followed by sequencing via 16S (bacteria and archaea) and metagenomic approaches.

16S rRNA sequencing results showed the presence of common sewer and human fecal matter microbes, including Proteobacteria (Aeromonas, E. coli, Pseudomonas, Sphingomonas, and Neisseria species, among others), Campylobacteria (Arcobacter and Pseudoarcobacter species), Fusobacteria (Leptotrichia species), and Bacteroidata (Bacteroides and Prevotella species). Bacteria appearing in wastewater arise from human and animal waste (urine/fecal matter), discarded drinks and food, sewer lines, and other sanitary practices (showering, laundry, hand washing). Generally, this source points to the human microbiome (gut bacteria) and could also provide information on microbial outbreaks including pathogens of concern and antimicrobial resistant bacteria. Whole genome sequencing was beneficial in identifying all microbes found on the Center for Disease Control and Prevention’s (CDC) Active Bacteria Core Surveillance program and Antibiotic-Resistant Threats report, with the exception of Salmonella typhi and Candida auris. The presence of common wastewater archaea at the Class level included Halobacteria (salt-tolerant), Methanobacteria (human gut), Methanomicrobia (methane producing, human gut), Methanococci (methane producing, human gut), and Thermoprotei (sulfur metabolizing) sub-species. Some archaea species are found to correlate with the gut microbiome, including colorectal cancer, diabetes, and bowl syndromes and were detected in Tribal wastewater.

N/A

Terence

Egbelo

General

Poster only

Handling topological bias in biomedical knowledge graphs

Egbelo

Terence Egbelo, Zeyneb Kurt, Charlie Jeynes, Mike Bodkin, Val Gillet

University of Sheffield, University of Sheffield, Evotec, University of Dundee, University of Sheffield

A biomedical knowledge graph (KG) aggregates and interconnects associations between entities such as proteins, compounds and diseases. A variety of properties relevant to drug discovery programmes can be encoded as graph edges between entities, for example the co-occurrence of certain protein targets and adverse events. These pairwise relations may associate with other patterns in the graph, such as the tendency of adverse events to associate with proteins that interact with each other.

The task of KG completion, including in the biomedical domain, is concerned with learning mappings between pairwise relations of interest and other graph patterns, and using them to predict new instances of those relations.

This poster summarises a study tackling the prediction of target-adverse event associations as KG completion, with special attention paid to the problem of topological bias in KGs. This problem occurs when biases such as research attention (reflected in the graph through high entity degree) associate consistently with the relation of interest. Bonner et al (2021) demonstrated that topological bias is pervasive in biomedical KGs and belies the performance achieved in KG completion by many state-of-the-art methods.

The present work shows the presence of this distortion in a KG designed for the prediction of new target-adverse event relations and explores a novel procedure to reduce the impact of topological bias as part of a metapath-based (e.g. Lao et al 2010, Himmelstein et al 2017) KG completion workflow. This is achieved by injecting a representation of relation-specific entity rarity into the metapath features, taking inspiration from Spärck Jones (1972).

The modified metapath approach displays competitive KG completion performance and a decreased tendency to predict new target-adverse event relations for pairings of higher-degree entities, demonstrating mitigation of the topological bias problem while retaining predictive accuracy.

https://terence-egbelo-academic-posters.s3.us-east-1.amazonaws.com/Terence+Egbelo+PSB+2025+poster.pdf

Nowlan

Freese

General

Poster only

Integrating data sources for visualization in the Integrated Genome Browser

Freese

Nowlan H. Freese, Jaya Sravani Sirigineedi, Karthik Raveendran, Paige J. Kulzer, Ann E. Loraine

University of North Carolina at Charlotte

We develop the Integrated Genome Browser (IGB), an open-source desktop genome browser that supports interactive visualization of genomic datasets such as aligned sequences (BAM/CRAM/SAM from "Seq" experiments), gene annotations (BED/GFF), variation (VCF/BCF), and coverage (wiggle/bedGraph). With the rapid output of new genomes, it has become increasingly difficult for our small team to keep up with curating and storing new genomes. Therefore, we have pushed to make IGB even more "integrated" by connecting to resources that provide genome sequences and annotations (UCSC, track hubs, Quickload), data storage and analytics capability (Galaxy, CyVerse), and interactive web-based "Seq" data exploration interfaces (BAR eFP-Seq). For example, by utilizing the UCSC Genome Browser REST API we can provide most of the assemblies and tracks from UCSC for visualization within IGB. As IGB runs locally, a user can now view data stored on their computer in IGB alongside UCSC tracks without the need to upload their data to the web. Future work will add support for viewing genomes from Ensembl via their API. Our goal is to make it as easy as possible for users to view data from many sources within the Integrated Genome Browser.

N/A

Carsten

Görg

General

Poster only

SHAPE: a visual computing pipeline for interactive landmarking of 3D photograms and patient reporting for assessing craniosynostosis

Görg

Carsten Görg, Connor Elkhill, Jasmine Chaij, Kristin Royalty, Phuong D. Nguyen, Brooke French, Ines A. Cruz-Guerrero, Antonio R. Porras

Colorado School of Public Health, Colorado School of Public Health, Children’s Hospital Colorado, Children’s Hospital Colorado, Children’s Hospital Colorado, Children’s Hospital Colorado, Colorado School of Public Health, Colorado School of Public Health

3D photogrammetry is a cost-effective, non-invasive imaging modality that does not require the use of ionizing radiation or sedation. Therefore, it is specifically valuable in pediatrics and is used to support the diagnosis and longitudinal study of craniofacial developmental pathologies such as craniosynostosis — the premature fusion of one or more cranial sutures resulting in local cranial growth restrictions and cranial malformations. Analysis of 3D photogrammetry requires the identification of craniofacial landmarks to segment the head surface and compute metrics to quantify anomalies. Unfortunately, commercial 3D photogrammetry software requires intensive manual landmark placements, which is time-consuming and prone to errors. We designed and implemented SHAPE, a System for Head-shape Analysis and Pediatric Evaluation. It integrates our previously developed automated landmarking method in a visual computing pipeline to evaluate a patient’s 3D photogram while allowing for manual confirmation and correction. It also automatically computes advanced metrics to quantify craniofacial anomalies and automatically creates a report that can be uploaded to the patient’s electronic health record. We conducted a user study with a professional clinical photographer to compare SHAPE to the existing clinical workflow. We found that SHAPE allows for the evaluation of a craniofacial 3D photogram more than three times faster than the current clinical workflow (3.85±0.99 vs. 13.07±5.29 minutes, 𝑝 < 0.001). Our qualitative study findings indicate that the SHAPE workflow is well aligned with the existing clinical workflow and that SHAPE has useful features and is easy to learn.

https://www.dropbox.com/scl/fi/6mmica1l7txey9krtlzi5/SHAPE_PSB_Poster.pdf?rlkey=54lagm35kujja35wm7cqj3i0h&dl=0

Jane

Huang

General

Poster only

The Role of Gendered Language in Eugenic Sterilization Recommendations

Huang

Jane Huang, Dr. Carly Bobak, Dr. Jacqueline Wernimont

The Eugenics Rubicon Project is a research initiative focused on analyzing the demographics and experiences of over 60,000 individuals sterilized under U.S. Eugenics laws during the 20th century. Drawing on data from the Sterilization and Social Justice Lab, this sub-project explores the complex interplay between gender, societal norms, and the language used by California institution superintendents to justify sterilizations. Our research specifically examines the gendered biases present in records from California, uncovering how gender-based stigmas influenced recommendations for female vs male sterilization.

Using natural language processing (NLP) techniques such as topic modeling, we analyze the language within records, revealing patterns where patients marked female were often described in terms of promiscuity or hyper-sexuality, in stark contrast to male patients who were often recommended for sterilization based on STDs/STIs which was frequently linked to homosexuality. This language not only reinforces harmful gender biases but also reflects the broader societal stigmatization of women and men based on sexual freedom and homosexuality, respectively. To effectively communicate these findings, we employ data visualization methods that adhere to the principles of Data Humanism, balancing aggregated trends with individual stories to humanize the patients behind the data.

Our visualizations aim to bridge the gap between aggregated data, which often sacrifices individual identities, and case-specific visualizations, which may lose broader context. By utilizing innovative visual storytelling, we aim to provoke empathy and engagement with these historical injustices, emphasizing the role of data visualization as a tool for social change. Our work represents a positive disruption in data science and bioinformatics by challenging traditional data visualization practices, integrating principles of data feminism, and promoting accountability and empathy in interpreting sensitive health data.

https://drive.google.com/file/d/1zKxiiTNxeE36hY1BC-6Q3t2WCaEbMdKV/view?usp=sharing

psb_abstract_poster_janehuang.pdf

Hirotaka

Iijima

General

Poster only

In silico exercise model as a tool to interrogate a health promoting effect of exercise on knee joint

iijima

Hirotaka Iijima, PhD, PT1,2,3, George Ross Malik, MD2,3, Fabrisia Ambrosio, PhD, MPT1,2,3

1Discovery Center for Musculoskeletal Recovery, Schoen Adams Research Institute at Spaulding, Charlestown, MA

2Department of Physical Medicine & Rehabilitation, Harvard Medical School, Boston, MA

3Department of Physical Medicine & Rehabilitation, Spaulding Rehabilitation Hospital, Charlestown, MA

A better understanding of the dose-dependent effects of exercise on cartilage health is an essential step in promoting the clinical success of exercise interventions and the development of exercise mimetics for the treatment of knee osteoarthritis (KOA). This study employs a series of computational approaches to elucidate the dose-dependent mechanisms through which exercise regulates cartilage health in the context of KOA. The integrated findings across in vivo, ex vivo, and in vitro KOA models revealed that moderate and intense exercise exerted opposing effects on inflammatory signals and cellular senescence, partly through selective regulation of Trpv4 and Piezo1/Piezo2 ion channels. Building on the mechanistic framework identified through these studies, we introduced a network medicine approach, which we termed “in silico exercise”, to interrogate the downstream cartilage molecular responses to Trpv4 and Piezo1/Piezo2 gene perturbation. We discovered that in silico exercise at moderate and intense levels exerted distinct transcriptional effects on intra-cellular signals, which were due, at least in part, to the differential regulation of estrogen receptor signaling. Computational analyses were validated through pharmacological manipulations. Our network medicine approach represents a novel framework for the mechanistic dissection of the health-promoting effects underlying physical therapeutics, even beyond KOA.

N/A

psb2025_abstract_in_silico_exercise.pdf

Semin

Lee

General

Poster only

Development of multi-omics analysis platform for 3D organoid-based personalized medicine prediction system

Jang

Jinho Jang, Hyoung-oh Jeong, Se-Hoon Lee, Joo Kyung Park, Hyunsook Lee, Semin Lee

Ulsan National Institute of Science and Technology, Ulsan National Institute of Science and Technology, Samsung Medical Center, Samsung Medical Center, Seoul National University, Ulsan National Institute of Science and Technology

Tumor three-dimensional organoid culture models show great promise as a tool for cancer precision medicine with an application for studying drug response. Here, we describe the development of a multi-omics analysis platform for 3D organoid-based personalized medicine prediction system that integrates whole-exome sequencing, whole-transcriptome sequencing, and methylation sequencing with high-throughput drug screens on patient-derived tumor organoids.

N/A

psb_2025postersemin_lee20240927.pdf

Jae-Hwan

Jhong

General

Poster only

Comparison of algorithms for estimating regression splines using lasso penalty

Lee

Eun-Ji Lee, Jong-Beom Park, Dong-Young Lee, Jae-Hwan Jhong

Department of Information Statistics, Chungbuk National University

The Lasso regression model is a penalization technique that facilitates variable selection from high-dimensional

data. When combined with a truncated power spline, Lasso regression addresses the knot selection problem from

a nonparametric regression perspective. We compare the performance of three methods—coordinate descent,

quadratic programming, and the alternating direction method of multipliers—in fitting Lasso regression spline

models. Through simulations and analyses of two real datasets, we conduct numerical studies to evaluate the

performance of these methods. By comparing their performance under various conditions, this study ultimately

offers recommendations for the most suitable algorithm for different scenarios.

https://f1000research.com/posters/13-1470

poster_jhong.pdf

Moon-Chang

Baek

General

Poster only

Multiplexed Detection Platform Implemented with Magnetic Encoding and Deep Learning-based Decoding for Quantitative Analysis of Exosomes from Cancers

Lee

Soyoung Lee, Sang-Hyun Kim, Moon-Chang Baek

Department of Innovative Pharmaceutical Sciences, College of Advanced Technology Convergence, Kyungpook National University, Daegu, Republic of Korea,

Department of Pharmacology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea,

Department of Molecular Medicine, CMRI, School of Medicine, Kyungpook National University, Daegu, Republic of Korea

Although a number of biosensing technologies have been reported for the detection of cancer-derived exosomes used as early diagnosis markers for cancers, it is necessary to identify them with various biomarkers to distinguish the stages and types of cancers owing to the extreme heterogeneity of cancer. Here, we developed a new multiplexed assay platform for the detection of exosomes using magnetic encoded microparticles (MEMPs), which can recognize multiple proteins expressed on exosomes, and a deep learning-based decoding algorithm. This platform, in which the accuracy of the decoding algorithm was evaluated to be 93%, was applied to detect exosomes from four types of cancer cell lines and plasma from patients with cancer using three cancer biomarkers: PD-L1 (Programmed Death-Ligand 1), EpCAM (Epithelial cell adhesion molecule), and EGFR (Epidermal growth factor receptor). The limit of detections (LODs) of this platform when applied to the detection of exosomes from MDA-MB-231 cell line were calculated as 4.03 × 106 mL-1 for PD-L1, 1.00 ×107 mL-1 for EpCAM, and 7.17×106 mL-1 for EGFR, respectively. In a clinical study, four types of samples from patients with cancer (n = 92) showed higher signals than those of healthy controls (n = 18). Based on these results, we confirmed that this platform can distinguish patients with cancer from healthy individuals.

N/A

abstract.pdf

Hee-Jung

Jee

General

Poster only

REAL-TIME MISSING DATA IMPUTATION USING ROBUST STATISTICAL MODELS IN DYNAMIC DATA STREAMS

Lee

Jun Haeng Lee, Hee-Jung Jee

Chungbuk National University, Chungbuk National University

Streaming data refers to data that is generated and transmitted in real-time. Since streaming data rapidly increases in both volume and velocity, it is processed and analyzed in real-time as soon as it arrives, rather than after being accumulated. However, if missing data occurs, it may introduce bias in the estimates or complicate the analysis process. Traditional imputation methods, which rely on imputing data after accumulation, are difficult to apply to streaming data. We propose a real-time missing value imputation technique using flexible spline models that dynamically update point estimates and standard errors. This method is doubly robust and computationally efficient, as estimates of regression coefficients from previous batches of data are updated for new batches and combined with summary statistics. Through simulation, we confirmed that the bias in the estimates was reduced when missing values were imputed using the proposed method. As a result of this study, it is expected to help reduce data loss due to missing data in streaming data.

N/A

Eric

Lee

General

Poster only

Uncovering the tissue ecosystem with a comprehensive spatiotemporal analytical framework

Lee

Eric Lee, Tina Hsu, Andrew Roth, Samuel Aparicio

Department of Molecular Oncology, British Columbia Cancer Agency, Canada

High-plex in situ subcellular profiling technologies are evolving the study of biological tissue by revealing the spatial context of molecular distribution and cellular communication. Spatial data can elucidate the links between expression dynamics and localization which are critical to identifying biomarkers and therapeutic targets. Deciphering high-plex imaging data involves a non-trivial multi-layered workflow ensembling different tools. Computational methods specific to spatial analysis have been rapidly developing, yet current downstream analyses remain heavily dependent on established single cell methods that ignore the spatial context.

Here, we present Sakura, a framework for systematic and reproducible spatial-omics analyses at subcellular resolution. Sakura integrates state-of-the-art and newly developed methods for image processing and spatiotemporal analysis into a modularized package. To streamline workflow design in Sakura, statistical testing for method evaluation has been implemented into each module. Sakura is highly scalable and easily customizable, making it ideal for analyzing spatial transcriptomics, proteomics, and their integration. We demonstrate the advantages of Sakura through analyses of normal breast tissue as well as follicular lymphoma.

N/A

sakura_poster.pdf

Jui-Hsuan

Chang

General

Poster only

Enhancing Clinical Outcome Predictions through Effective Sample Size Evaluation in Graph-Based Digital Twin Modeling

Li, Chang, Ven

Xi Li, Jui-Hsuan Chang, Mythreye Venkatesan, Zhiping Paul Wang, Jason H. Moore

Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA

Digital twins in healthcare offer a revolutionary approach to enhancing the healthcare system by enabling personalized diagnosis, prognosis, and treatment. SynTwin, a novel methodology to create digital twins combining synthetic data and network science, has shown promise in improving predictions for breast cancer mortality, demonstrating its potential to advance precision medicine efforts. In this study, we further validate SynTwin by applying to different cancer types from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). We assess its efficacy across varying sample sizes (1,000 to 30,000 records), mortality rates (35% to 60%) and study designs, revealing insights into the strengths and limitations of synthesized data in mortality prediction. Our results indicate that for larger datasets, with sample sizes exceeding 10,000 cases, including a synthetic patient population in the nearest network neighbor prediction model consistently improves the performance compared to using real patients alone. Specifically, AUROCs ranged from 0.828 to 0.884 for cancers such as cervix uteri and ovarian cancer with synthetic patients, compared to 0.720 to 0.858 when using real patient data. These results highlight the benefit of network-based digital twins, while emphasizing the importance of considering effective sample size when developing predictive models like SynTwin.

N/A

Steven

Brenner

General

Poster only

RISE: Relative Impact of Splicing and Expression in transcriptome studies

Lin

Yu-Jen Lin, Amr A. Alazali, Zhiqiang Hu, Steven E. Brenner

Yu-Jen Lin 1,2 , Amr A. Alazali 1,3 , Zhiqiang Hu 2,4,5 , Steven E. Brenner 1,2,4,6

1 Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA

2 Center for Computational Biology, University of California, Berkeley, California 94720, USA

3 Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA

4 Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA,

5 Currently at: Illumina, Foster City, California 94404, USA, 6 College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA

RNA-seq has been widely used to quantify expression and splicing changes in transcriptomes. Although biological consequences arise from changes in both expression and splicing aspects, researchers usually use their impressions to choose only one aspect to analyze, potentially overlooking significant impacts of the other. Even if researchers investigate both, the measurement scales of expression and splicing are different, and thus, their impacts are incomparable. To compare the relative impact of expression and splicing, we have developed RISE. RISE qualifies the relative impact of expression and splicing changes caused by the treatment. To place the impact of expression and splicing changes on the same scale to compare, we developed the Normalized Variation (NV) measure. NV is defined as the proportion of the between- group variation to the total variation. Finally, we assess whether expression NV (eNV) or splicing NV (sNV) is significantly larger to understand the comparative influence of expression versus splicing alternations in the transcriptome.

To validate our method, we performed RISE analysis on RNA-seq data from knockdown or overexpression experiments of 11 transcription and splicing factors. RISE effectively categorizes transcription and splicing factors by their relative impacts on expression and splicing. As an example application, we applied RISE to 4 studies involving proteins with complex or previously unknown roles in regulating transcriptomes to understand their functions. In summary, RISE enables researchers to systematically compare the relative impact of expression and splicing.

https://compbio.berkeley.edu/poster/250106_LinJ_PSB_RISE_poster.pdf

250106_brennerse_psb_rise_abs_1_dec_2024.pdf

Xueying

Liu

General

Poster only

CSIGEP: A scalable, GPU-based unsupervised machine learning tool for recovering gene expression programs in atlas-scale single-cell RNA-seq data

Liu

Xueying Liu, Richard H. Chapple, Paul Geeleher

Department of Computational Biology,

St. Jude Children's Research Hospital, Memphis, TN, USA

Single-cell RNA-seq (scRNA-seq) can now be cheaply collected in millions of cells. Typically, these huge “atlases” are interrogated using tools such as UMAP, performing hard clustering over two dimensional projections of the data. However, conventional methods have been heavily criticized, with data distorted by arbitrary parameter choices, and fidelity to ground truth easily lost. Consensus-NMF-based models, which allow scRNA-seq data to be represented as “gene expression programs” (GEPs), perform vastly better in preserving the underlying structure of single-cell RNA-seq, but currently, these methods are not scalable. Here, we have developed a computationally efficient GPU-based consensus-NMF method. We implemented the method using a machine learning technique called “mini-batching”, where the model is trained on iterative subsets of the data, and is thus scalable to a dataset of any size. We show using simulated data that this approach can recover ground truth GEPs in atlas-scale scRNA-seq data and outperforms existing methods (base NMF, iNMF, SignatureAnalyzer and a variational autoencoder-based model ScVI,). As proof-of-principle, we have applied the method to an atlas-scale integrated scRNA-seq map of human tumors and preclinical models. The method recovers GEPs missed by existing approaches, yielding novel insights into the fidelity of transcriptional programs in preclinical cell line models.

N/A

psb2025_liu_poster.pdf

Sriraam

Natarajan

General

Poster only

Causal Bayesian Network Construction for Patients on ECMO using Large Language Models

Mathur

Saurabh Mathur,Ranveer Singh,Michael Skinner,Ethan Sanford,Neel Shah,Phillip Reeder,Lakshmi Raman,Sriraam Natarajan

The University of Texas at Dallas,The University of Texas at Dallas,The University of Texas at Dallas,The University of Texas Southwestern Medical Center,Washington University at St. Louis,The University of Texas Southwestern Medical Center,The University of Texas Southwestern Medical Center,The University of Texas at Dallas

Extracorporeal Membrane Oxygenation (ECMO) is a method for supporting patients with severe cardiac or respiratory failure. However, ECMO patients are at a higher risk of neurological injury (NI). Understanding underlying causal mechanisms is critical for clinical decision-making. Causal Bayesian Networks (CBNs) offer a powerful framework for representing and reasoning about the complex interplay of factors influencing these patients; however, obtaining complete CBNs from domain experts can be challenging. To this effect, we explore the use of Large Language Models (LLMs) for CBN construction. While LLMs can reproduce causal relationships reflected in their training data, they may also generate spurious associations. We address this by refining the LLM-generated BN using data from 71 patients and indirect domain knowledge, such as partial variable ordering and constraints on impossible edges (e.g., neurological injury cannot influence hypertension within the first 24 hours on ECMO). We found that the refinement procedure removed three spurious edges from the LLM-elicited BN (High VIS to Hypertension, Hypotension to low pH, low pH to high lactate), maintained one edge (Low Platelet to NI) and added two new edges (Relative pCO2 to NI, Hypertension to Low Platelet). The key difference from the expert-elicited BN was that in the refined BN, Relative pCO2 directly influenced NI instead of via Low pH. Our preliminary results suggest that LLMs along with refinement can be an effective method for Causal BN construction.

N/A

psb_ecmo_draft.pdf

Jason

McDermott

General

Poster only

Enhancing Community Metabolic Modeling in Biofilms for One Health

McDermott

Jason McDermott, Sapna Dass, Sneha Couvillion, Winston Anthony, Tara Nitka, Amy Zimmerman, Christine Chang, Carrie Nicora, William Nelson

Pacific Northwest National Laboratory, Texas A&M University, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, Pacific Northwest National Laboratory

Increased development and agricultural expansion has brought the issue of zoonotic transmission to the forefront of the fight against infectious disease, precipitating catastrophic world-wide events such as the recent COVID-19 pandemic. Transmission of viruses between wild animal and livestock populations is poorly understood, and methods for monitoring are limited. Animal-associated environmental biofilms, which can harbor fomites could represent a vital monitoring point in disease tracing and surveillance efforts. As part of a USDA-APHIS funded project we are assessing the capability of livestock and wild animal associated biofilms to serve as reservoirs for SARS-CoV-2 and as potential hot spots for monitoring of animal and human viruses in general. We are using a combination of metagenomics, metatranscriptomics, and metabolomics to identify which metabolic and functional characteristics of the biofilms affect the ability of the biofilms to harbor infectious viruses using an enhanced community metabolic modeling pipeline.

Genome-scale metabolic models have been widely used to understand metabolism of individual bacteria but are difficult to apply to microbiome analysis due to incomplete genome sequences, gaps in annotation for key enzymes, and the lack of inclusion of experimental data from transcriptomics or metabolomics. I will present three separate tools developed by our groups especially aimed at difficult genomes derived from metagenomic sequencing. MetaPathPredict, is a deep learning framework for prediction of the presence of complete metabolic modules in an organism from incomplete genome data. The OMics-Enabled Global GApfilling (OMEGGA) tool, performs simultaneous global gap-filling from experimental growth data and allows integration of multi-omics data. Finally, Snekmer is a generalized computational framework for building sequence-based models for protein families for enzyme function. Our results demonstrate that our integrated, data-driven approach to improving metabolic results in improved metabolic models that are more informative for understanding the roles of biofilms as environmental reservoirs of important pathogens.

http://jasonya.com/McDermott_CMICPoster_format_2024.pdf

Irene

Mei

General

Poster only

Transcriptional modulation unique to vulnerable motor neurons predicts Amyotrophic lateral sclerosis across species and SOD1 mutations

Mei

Irene Mei(1), Susanne Nichterwitz(1,3), Melanie Leboeuf(1,2), Jik Nijssen(2,3), Isadora Lenoel(4), Dirk Repsilber(5), Christian S. Lobsiger(4), Eva Hedlund(1,2,3)

1 Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden

2 Department of Cellular and Molecular Biology, Karolinska Institutet, Stockholm, Sweden

3 Department of Neuroscience, Karolinska Institutet, Stockholm, Sweden

4 Sorbonne Université, Institut du Cerveau-Paris Brain Institute - ICM, Inserm, CNRS, Paris, France

5 School of Medical Sciences, Örebro University, Örebro, Sweden

Amyotrophic lateral sclerosis (ALS) is characterized by the progressive loss of somatic motor neurons (MNs), which innervate skeletal muscles. However, certain MN groups including ocular MNs that regulate eye movement are relatively resilient to ALS. To reveal mechanisms of differential MN vulnerability, we investigate the transcriptional dynamics of two vulnerable and two resilient MN populations in SOD1G93A ALS mice. Differential gene expression analysis shows that each neuron type displays a largely unique spatial and temporal response to ALS. Resilient MNs regulate few genes in response to disease, but show clear divergence in baseline gene expression compared to vulnerable MNs, which in combination may hold the key to their resilience. EASE, fGSEA and ANUBIX enrichment analysis demonstrate that vulnerable MN groups share pathway activation, including regulation of neuronal death, inflammatory response, ERK and MAPK cascades, cell adhesion and synaptic signaling. These pathways are largely driven by 11 upregulated genes, including Atf3, Cd44, Gadd45a, Ngfr, Ccl2, Ccl7, Gal, Timp1, Nupr1, Serpinb1a and Chl1, and indicate that cell death occurs through similar mechanisms across vulnerable MNs albeit with distinct timing. Machine learning using DEGs upregulated in our SOD1G93A spinal MNs predict disease in human stem cell-derived MNs harboring the SOD1E100G mutation, and show that dysregulation of VGF, PENK, INA and NTS are strong disease-predictors across SOD1 mutations and species. Meta-analysis across mouse SOD1 transcriptome datasets identified a shared transcriptional vulnerability code of 32 genes including e.g Sprr1a, Atf3, Fgf21, C1qb, Nupr1, Gap43, Adcyap1, Vgf, Ina and Mt1. In conclusion our study reveals vulnerability-specific gene regulation that may act to preserve neurons and can be used to predict disease.

N/A

poster_im_2024.pdf

Jaye

Moors

General

Poster only

Returning Results in Practice: A Case Study from Aotearoa, New Zealand

Moors

Jaye Moors, Olivia Gray, Ratu Rapana, Colin Beumelburg, Melissa Hendershott, Sarah LeBaron von Baeyer, Kaja Wasik, Tony R. Merriman, Nuku Rapana

Variant Bio. Inc. Seattle, WA, USA, Variant Bio. Inc. Seattle, WA, USA, Pukapuka Community Centre, Māngere, Auckland, New Zealand, Pukapuka Community Centre, Māngere, Auckland, New Zealand, Variant Bio. Inc. Seattle, WA, USA, Variant Bio. Inc. Seattle, WA, USA, Variant Bio. Inc. Seattle, WA, USA, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA & Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand, Pukapuka Community Centre, Māngere, Auckland, New Zealand.

Genomics research has faced skepticism from Indigenous and minority communities due to a history of harm and inequity driven by a lack of transparency and extractive research practices. A central cause of these concerns is the ‘absent researcher’ who fails to return relevant health findings and data to participating communities. Returning research results on both the individual and aggregate scale is a crucial act of reciprocity and is essential to rebuilding trust and fostering locally meaningful relationships.

This case study highlights the outcomes of a genomics research project designed in partnership between researchers from academia and industry in the US and Aotearoa NZ, and the Pukapuka community of Auckland. The study was guided by the University of Otago, Pacific Research Protocols and emphasized trust-building and community empowerment through extensive engagement, respect for cultural protocols, equitable benefit-sharing, and the return of individual and aggregated health findings informed by community health priorities.

Throughout the study, actionable medical results from blood and urine tests were returned to individual participants by a medical doctor who was part of the all-Pukapukan study team. Following completion of the study, group-level results were shared by researchers during a community event at the Pukapuka Community Centre, Auckland. This milestone event incorporated several Pacific languages, and diverse communication methods such as an in-person presentation, printed materials, visual aids, and local media outlets to enhance accessibility. Community stakeholders played a key role in shaping the process, ensuring alignment with their values and areas of interest. By tailoring the results return process to the unique needs of the Pukapuka community, we ensured that the findings addressed locally defined health priorities and respected cultural norms. This approach demonstrates how genomics partnerships can move towards equity by fostering meaningful partnerships and ensuring research benefits are shared in culturally appropriate and impactful ways.

N/A

psb2025_poster_jmoors.pdf

Susana

Posada Cespe

General

Poster only

Molecular Signatures of Cone-Saving Compounds in Human Retinal Organoids

Posada Cespe

Stefan E. Spirig, Susana Posada-Céspedes, Valeria J. Arteaga-Moreta, Zoltan Raics, Stephanie Chreng, Olaf Galuba, Inga Galuba, Isabelle Claerr, Steffen Renner, Larissa Utz, P. Timo Kleindienst, Adrienn Volak, Jannick Imbach, Svitlana Malysheva, Rebecca A. Siwicki, Vincent Hahaut, Yanyan Hou, Simone Picelli, Marco Cattaneo, Josephine Jüttner, Cameron S. Cowan, Myriam Duckely, Daniel K. Baeschlin, Magdalena Renner, Vincent Unterreiner, Botond Roska

Institute of Molecular and Clinical Ophthalmology Basel & University of Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Novartis Biomedical Research, Novartis Biomedical Research, Novartis Biomedical Research, Novartis Biomedical Research, Novartis Biomedical Research, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel & University of Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel & University of Basel, Institute of Molecular and Clinical Ophthalmology Basel, Institute of Molecular and Clinical Ophthalmology Basel, Novartis Biomedical Research, Novartis Biomedical Research, Institute of Molecular and Clinical Ophthalmology Basel, Novartis Biomedical Research, Institute of Molecular and Clinical Ophthalmology Basel & University of Basel

Degeneration of cone photoreceptors is a major contributor to blindness in retinal diseases affecting central vision, such as age-related macular degeneration and cone-rod dystrophies. Despite its significance, the mechanisms underlying cone degeneration remain poorly understood, hindering the development of effective treatments. Studying these mechanisms requires physiologically relevant models; notably, human retinal organoids better reflect human-specific retinal biology compared to animal models.

In this study, we conducted a large-scale compound screen in human retinal organoids under glucose starvation—a condition that induces cone death. From an initial pool of 2,707 candidates, we identified three compounds (HSP90AA1I-1, CS-KI-1, and CS-KI-2) with significant protective effects on cones during seven days of glucose starvation. To uncover genes and pathways involved in promoting cone survival, we analyzed transcriptomic changes in cones under glucose starvation with and without treatment with the identified cone-protective compounds.

HSP90AA1I-1 induced marked transcriptomic shifts, notably upregulating the unfolded protein response, consistent with HSP90's established role as a molecular chaperone. However, prolonged inhibition of HSP90 over 14 days caused significant damage to cones. In contrast, kinase inhibitors CS-KI-1 and CS-KI-2 appeared to protect cones via alternative pathways independent of their known inhibition targets. To differentiate their cone-protective mechanisms from these targets, we also compared transcriptomic changes induced by CS-KI-1 and CS-KI-2 to those caused by structurally similar, yet non-protective, analogs (CS-KI-1A and CS-KI-2A). CS-KI-1 downregulated apoptotic and inflammatory pathways, suggesting enhanced cone survival, while CS-KI-2 upregulated genes related to mTORC1 signaling and metabolic processes, including cholesterol and fatty acid metabolism, indicating a role in metabolic regulation.

These findings reveal distinct molecular mechanisms engaged by cone-protective compounds, offering insights into potential therapeutic pathways to mitigate cone degeneration.

Vivekananda

Sarangi

General

Poster only

Predicting cell type for single cells amplified DNA with Primary Template-directed Amplification (PTA)

Sarangi

Vivekananda Sarangi, Livia Tomasini, Liana Fasching, Yan Asmann, Flora Vaccarino, Alexej Abyzov

Department of Quantitative Health Science Mayo Clinic, Child Study Center Yale University, Child Study Center Yale University, Department of Quantitative Health Science Mayo Clinic, Child Study Center Yale University and Department of Neuroscience Yale University, Department of Quantitative Health Science Mayo Clinic

Introduction:

Single-cell DNA sequencing involves three main steps: isolating a single cell, amplifying its DNA, and sequencing the amplified DNA. Isolating specific cell types from a mixed sample is challenging, particularly in complex tissues like tumors, where cancer cells coexist with normal and immune cells. Although cell sorting based on surface markers is possible, it is limited to certain cell types. An alternative approach where one can infer cell type directly from sequencing data will be highly beneficial.

Objective:

We have developed a method to predict cell type using whole-genome sequencing (WGS) data from single-cell DNA amplified with Primary Template-directed Amplification (PTA) technique, which has demonstrated superior performance over older methods like Multiple Displacement Amplification (MDA). PTA provides higher uniformity of genome coverage and lower allelic dropout rates by amplifying the primary DNA template and limiting error propagation. PTA relies on phi29 polymerase. We hypothesize that the polymerase’s preference for a C to T error in methylated cytosine versus un-methylated, can be used to predict methylated regions, which when aggregated over the entire genome could be used to reveal cell types.

Results:

Using 3 iPSC single cells and 3 neurons from frozen brain (NeuN+ marker), we have identified 1043 (500 base pair) regions that have differential C to T error profiles between the two cell types. We utilized these regions to cluster 69 single nuclei from 18 brains (data obtained from our experiments and publicly available data) and 11 iPSC single cells (from our experiment). Clustering of cells revealed 3 clusters, 2 corresponding to iPSC and neurons and the third possibly represents non-neuronal brain cells suggesting that our method can be used to identify different cell types using PTA amplified DNA sequencing data.

N/A

abstract_psb_2025_vs.pdf

Hyun-Tae

Shin

General

Poster only

Detection of Minimal Residual Disease in Non-Small Cell Lung Cancer Using WGS

Shin

Jun Hyeok Lim, Jeong-Seon Ryu, Hyun-Tae Shin

Department of Internal Medicine Inha University Hospital Incheon Republic of Korea, Department of Dermatology Inha University Hospital Incheon Republic of Korea, Research Center for Controlling Intercellular Communication (RCIC) Inha University School of Medicine Incheon Republic of Korea

The detection of minimal residual disease (MRD) following cancer treatment is critical for preventing recurrence and improving patient prognosis. However, existing MRD detection methods often lack the sensitivity required for precise monitoring. We have developed a highly sensitive method for MRD detection using whole-genome sequencing (WGS) analysis of cell-free DNA (cfDNA) with individual-specific features. The approach begins by identifying consensus single nucleotide variants (SNVs) that are consistently detected by multiple variant callers in the patient’s tumor WGS. Using a novel algorithm, we quantify supporting reads at the consensus SNV positions and assess their statistical significance based on the error distribution in the patient’s plasma WGS. Validation through cancer cell-line mix experiments demonstrated a sensitivity of less than 1:12,800. Furthermore, the method effectively detected MRD in clinical samples from non-small cell lung cancer. This highly sensitive approach for detecting MRD offers the potential to improve cancer patient prognosis significantly. Moreover, this technology holds great promise for broad applicability across various cancer types where cfDNA-based monitoring can provide clinical benefits.

N/A

Junghwa

Hong

General

Poster only

Pilot Study of a Patched-CNN Model for Detecting the Location of the Endolymphatic Duct in the Human Vestibular System

Song

Hyeyeong Song¹, Soonmoon Jung¹, Youngho Lee¹, Jaemin Kim¹, Jiwoo Jang¹, Inyeop Na¹, Seungyun Oh¹, Sung Huhn Kim², Joo Hyun Kim³, Junghwa Hong¹†

Korea University¹†,

Yonsei University²,

New York University³

Benign Paroxysmal Positional Vertigo (BPPV) is characterized by brief episodes of rotational vertigo triggered by head position changes, primarily due to floating otoconia in the semicircular canals which is detached from the macula of utricle. Despite following initial treatments of BPPV, including vestibular rehabilitation exercises or canalith repositioning maneuvers, recurrence rates reach up to 13.5% within six months, 18% within one year, and approximately 50% over five years. To achieve complete and recurrence-free treatment of BPPV, it is necessary to mobilize the otoconia within the utricle through endolymph flow, guiding them to the endolymphatic fluid outlet for removal via the endolymphatic sac. The location of the endolymphatic fluid outlet within the utricle varies between individuals; therefore, it is essential to identify its three-dimensional position through vestibular MRI or CT imaging analysis. However, the endolymphatic fluid outlet has a small diameter of approximately 0.1-0.2 mm, and individual variations in vestibule shape and size complicate accurate localization through conventional imaging analysis. This study aimed to provide useful information for BPPV diagnosis and treatment through the prediction of 3D landmark positions within the vestibular organ. To achieve this, we proposed the patch-based CNN model (PCNN1) which adopts a multi-task learning approach performing both classification and regression simultaneously. The 3D coordinates of six landmarks were collected from head MRI data from 20 healthy individuals. PCNN1 was compared with single-task models (PCNN2 and PCNN3) and an intermediate-layer supervision model (PCNN4). Results showed PCNN1 had smaller prediction errors than PCNN2 and PCNN3, while demonstrating multi-task learning benefits. Although PCNN4 showed the smallest error, it had the longest runtime, making PCNN1 superior due to its balanced performance.

N/A

psb_2025_poster.pdf

Karen

General

Poster only

Text Classification Model for Acute Myeloid Leukemia Risk Stratification

Karen Vo, Erika Li, Jack Guo Zeng, Tian Yi Zhang

Stanford Department of Medicine, Stanford Department of Medicine, Stanford Department of Medicine, Stanford Division of Hematology

Acute Myeloid Leukemia (AML) is a type of cancer originating in the bone marrow that affects blood cell production. AML’s primary treatments include chemotherapy and bone marrow transplants. Prognosis for AML patients often relies on the European LeukemiaNet (ELN) 2022 criteria, which stratifies patients solely based on genetic mutations. While this may be effective to some extent, this method overlooks other critical clinical and demographic factors, limiting its ability to predict survival outcomes comprehensively. Our project addresses this gap by leveraging advanced machine learning techniques and large language models to transform the way that AML risk stratification is performed. Using a HIPAA-compliant data pipeline, we extract and process unstructured textual data from electronic health records (EHRs), encompassing demographic details, pathology reports, treatment data, karyotype data, and mutation information. These data points are aggregated into a unified format and processed through Google Cloud Platform’s AutoML text classification models to improve prognostication accuracy. The model stratifies patients into those likely to die within two years versus those with longer survival, enabling a more nuanced risk assessment. This work is important because it taps into underutilized clinical data, demonstrating that factors beyond genetic mutations significantly contribute to patient outcomes. By integrating a broader range of data, our approach bridges critical gaps in AML risk assessment, offering scalability and automation for handling complex datasets. Preliminary results highlight the potential of this method to outperform traditional models, with implications for precision medicine and improved patient care. Future directions include expanding the dataset to incorporate social determinants of health and applying this methodology to other diseases, further emphasizing the transformative impact of combining LLMs and machine learning in healthcare.

N/A

brite_poster.pdf

Julia

Salzman

General

Poster only

STRUCT: a Statistical Approach to Identify RNA Secondary Structures from Raw Sequencing Data, Bypassing Multiple Sequence Alignment

Wang

Julie Fangran Wang, Arjun Rustagi, Julia Salzman

Stanford University, University of California, San Francisco, Stanford University

Here we present STRUCT, a direct, assembly-, alignment, and metadata- free statistical method for the bioinformatic inference of putatively structured RNA elements directly on raw sequencing data. By avoiding the statistical biases inherent to traditional computational approaches, STRUCT provides an ultra-fast, easy to use, and robust tool for hypothesis generation. We show that STRUCT is able to rediscover known structural elements in human and environmental viruses, as well as identifying previously unknown viral and cellular regulatory elements. By working on the read-level, STRUCT is thus positioned to accelerate the RNA structure discovery pipeline to match the rate of sequence generation.

N/A

Rene

Warren

General

Poster only

ntRoot: Scalable Ancestry Predictions from Genome Sequencing Data

Warren

Rene L Warren, Lauren Coombe, Johnathan Wong, Parham Kazemi, Inanc Birol

Canada’s Michael Smith Genome Sciences Centre, Vancouver, Canada, V5Z 4S6

Ancestry information is essential for large cohort studies, yet it is not always available or reliably measured. For studies with a genome sequencing component, ancestry predictions using current approaches are hindered by high computational demands and complex input requirements. We present ntRoot, a computationally-lightweight method for inferring human super-population-level ancestry from whole genome assemblies or raw sequencing data types, and demonstrate its utility on over 300 datasets. Leveraging an alignment-free variant detection framework that uses a succinct Bloom filter data structure to efficiently query a flexible genomic data input, and the integrated variant call sets from the 1000 Genomes Project (1kGP), ntRoot accurately predicts human ancestry from 153 1kGP and 279 1kGP-independent Simons Genome Diversity Project whole-genome sequencing (WGS) data. Ancestry predictions are computed within 30m using at most 13GB of RAM on complete and draft human genome assemblies, and in less than 1h15m on 30X WGS data, requiring up to 68GB of RAM. The ntRoot paradigm offers both global and local ancestry inference, providing high-resolution predictions across genomic loci. ntRoot fills a critical gap in cohort studies by enabling rapid and accurate pedigree inference at scale, promising to advance ancestry predictions for association studies in the genomic era. Availability: https://github.com/bcgsc/ntRoot

https://zenodo.org/doi/10.5281/zenodo.13844276

Akira

Wolfe

General

Poster only

Breathing Life Back into Zombie GPUs: A Sustainable Framework for AI Development

Wolfe

Akira Wolfe, Tyden Rucker, Keolu Fox, Eric Dawson

UC San Diego Cognitive Science, UC San Diego Computer Science, Native BioData Consortium, Nvidia Corporation

Due to rapid technological innovation concerning GPUs and their adoption by data centers, older models of GPUs form a substantial impact on e-waste generation. With data centers opting for the newest model, older GPUs which are notably still functional are tossed aside. This trend reflects functional obsolescence, where demand for advanced GPUs shortens their intended lifespan. Another source of e-waste comes from white labeling, which is the process of taking hardware that does not meet performance benchmarks but still usable for less demanding tasks. Industry estimates defect rates of 1-3% which could mean hundreds of thousands of defective GPUs annually for Nvidia. This project proposes a new pipeline where older GPUs can be reutilized in other areas of computing outside SOTA data centers, possibly in underserved communities, with goals to formalize an estimation of the amount of e-waste generated by SOTA data centers regarding their GPUs. Alongside estimations, it aims to identify and delineate areas in which the second hand GPU market can serve, focusing on underserved communities. After this preliminary estimation and analysis, this focuses on offering infrastructure and tools to facilitate the creation of this new second-hand GPU pipeline such as open source code repositories and toolkits. The current GPU pipeline requires a much needed optimization in order to facilitate the reduction of e-waste while promoting the sustainability of GPU usage. A goal is to take advantage of all of the unused hardware graveyard of resources by utilizing them to create less e-waste, build decentralized AI/ML community hubs, and reduce financial costs by purchasing used. With the GPU market valued at over $25 billion annually, even repurposing 10% of GPUs could unlock millions of overlooked resources, considering the positive impact it can have on developing nations and tribal nations by being more accessible and sustainable.

N/A

psb_akira.pdf

Matt

Wright

General

Poster only

ClinGen’s Variant Curation Interface (VCI) imports evidence data in standard formats via the Linked Data Hub (LDH)

Wright

Matt Wright, Christine Preston, Tierra Farris, Neethu Shah, Mark Mandell, Liam Mulhall, Bryan Wulf, Gabriella Sanchez, Gloria Cheung, Marina DiStefano, Steven Harrison, Justyne Ross, Hannah Dziadzio, Rachel Shapira, Clarissa Klein, Deborah Ritter, Aleks Milosavljevic, Sharon Plon, Teri Klein

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA,

Molecular and Human Genetics Department, Baylor College of Medicine, Houston, TX,

Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA,

Pediatrics-Hematology-Oncology, Baylor College of Medicine, Houston, TX,

Department of Genetics, University of North Carolina, Chapel Hill, NC,

Ambry Genetics, Aliso Viejo, CA

The NIH-funded Clinical Genome Resource Consortium (ClinGen) has developed a suite of tools that support variant classification. Several of these tools focus on the application of evidence criteria and classification of variants based on the ACMG/AMP sequence variant classification guidelines, and then the dissemination of those classifications to the global community. Among these tools is the Variant Curation Interface (VCI: https://curation.clinicalgenome.org ), an open-source variant classification platform that supports the FDA-recognized human variant curation process by ClinGen Variant Curation Expert Panels (VCEPs). The Linked Data Hub (LDH: https://ldh.genome.network/srvc) also supports variant curation, by providing efficient access to collated and standardized variant information from different data sources, data provenance information, and links to data sources made available through RESTful APIs. The LDH ingests evidence from key resources in non-standard formats and provides the VCI with this evidence in a structured, machine-readable standardized format to facilitate evidence data incorporation into variant curation pipelines.

One challenge for variant curation platforms such as the VCI involves identifying these critically valuable resources, accurately presenting the relevant evidence for specific variants, and providing evidence relevant to a given disease in a structured format for variant curators to review and then use in variant classification. Here we present how the LDH and VCI have addressed this challenge. Specifically we describe two cases where the LDH acquired data from external resources (in this case the pathogenicity predictor REVEL, and the Genome Aggregation Database (gnomAD) version 4.1 release), and provided it such that the VCI can appropriately access and display the data to curators within the curation interface. We describe how these use cases highlight the need for data standardization, version control, and user-driven feedback.

https://f1000research.com/posters/13-1434

Sonnet

General

Poster only

Exploring the Influence of Biased In-Context Learning in Vision-Language Models

Sonnet Xu, Joseph Janizek, Roxana Daneshjou

Daneshjou Lab, Department of Biomedical Data Science, Stanford University, CA, USA,

Internal Medicine Residency, Virginia Mason Medical Center, Seattle, WA; Department of Radiology, Stanford University, CA, USA,

Daneshjou Lab, Department of Biomedical Data Science, Stanford University, CA, USA

In-context learning (ICL), which involves prompting models with task examples, has been studied as a method to enhance image classification in large multimodal models (LMMs). While previous research has shown that increasing example numbers can improve predictive performance in medical applications, the impact of exampleselection on model bias remains unexplored. In this work, we develop a framework for assessing how ICL can impact bias. Using the Diverse Dermatology Images dataset, this study examined how ICL affects bias through three experiments using dark skin, light skin, and balanced dark skin/light skin examples. For dark skin tones (FST 5/6), F-scores varied based on demonstration composition. With balanced examples, F-scores improved from 0.43 (zero-shot) and reached 0.58 (80-shot). When using only FST 1/2 examples, F-scores started at 0.43, changed to 0.46 (20-shot), and reached 0.53 (80-shot). When using only FST 5/6 examples, F-scores moved from 0.43 to 0.61 (80-shot). For light skin tones (FST 1/2), F-scores started at 0.37 (zero-shot) across all conditions. With balanced examples, F-scores reached 0.36 (20-shot) and 0.35 (80-shot). Using only FST 1/2 examples, F-scores were 0.39 (20-shot) and 0.36 (80-shot). With FST 5/6 examples, F-scores decreased to 0.34 (20-shot) and 0.36 (80-shot). These findings suggest that while ICL can improve overall performance across skin types, it may simultaneously exacerbate biases, particularly when examples lack demographic diversity. F-scores vary substantially by demographic group and demonstration composition which highlights the importance of considering demographic balance in demonstration selection for medical image classification tasks.

N/A

Mehmet

Koyuturk

General

Poster only

RokaiXplorer for interactive analysis of phospho-proteomic data

Yilmaz

Serhan Yilmaz, Filipa Lopes, Marzieh Ayati, Daniela Schlatzer, Mark R. Chance, Mehmet Koyuturk

Case Western Reserve University, Case Western Reserve University, University of Texas Rio Grande Valley, Case Western Reserve University, Case Western Reserve University, Case Western Reserve University,

RokaiXplorer is a web service for interactive exploration of phosphorylation data. In addition to providing customized analysis and visualization of phospho-proteomic data at the peptide, site, protein, kinase, and pathway levels; RoKAiXplorer facilitates the exploration of these results in the context of kinase interaction networks. RoKAiXplorer visualizes the interactions of these kinases with the substrates that contribute most to the inferred activities, providing context and explainability for the predictions of kinase activity inference. Importantly, RoKAiXplorer facilitates deployment of a separate web service for each dataset, a utility that enables data generators to share their data “live” (i.e., allowing reviewers and readers of their papers to explore the data and the results.)

N/A

rokaixplorer_poster1.pdf

Kord

Kober

General

Poster only

Exploratory Evaluation of a Transcriptomics-Based Drug Repurposing Pipeline to Identify Compounds for Paclitaxel-Induced Peripheral Neuropathy

Yuen

Brian, Yuen

Khai, Kober

Esther, Chavez-Iglesias

Kord, Kober

UCSF School of Nursing, University of Central Florida College of Medicine

UC Irvine School of Pharmacy and Pharmaceutical Sciences

UCSF School of Nursing

UCSF School of Nursing, UCSF Helen Diller Family Comprehensive Cancer Center, UCSF Bakar Computational Health Sciences Institute

Background: Peripheral neuropathy is a major dose-limiting toxicity of paclitaxel, a commonly used drug to treat breast cancer. However, treatments for paclitaxel-induced peripheral neuropathy (PIPN) have limited efficacy. The goal of this study is to leverage transcriptomics data combined with publicly available drug screening data and evaluate a computational drug-repurposing pipeline to identify therapies from existing drugs based on expression reversal.

Methods: Bulk gene RNA-seq expression data were obtained from a publicly available study (GSE113941) of nine pools of mice after treatment with paclitaxel (n=5) in a dose that was sufficient to induce PIPN compared to not-treated mice (n=4). Differential gene expression was evaluated using gene count data with edgeR. A transcriptomics-based drug repurposing pipeline was then applied to the signature to identify PIPN-drug pairs with opposite transcriptional effects. Reversal scores were calculated for each drug in the CMap dataset (1000 small-molecule drugs) and 66,510 LINCS signatures. Significance was assessed using permutation analysis (n=10,000). Drug hits with a nominal p-value < 0.005 and reversal scores < 0 (indicating signature reversal) were examined. Significant enrichment of pathways was evaluated with DrugEnrichr (FDR < 0.01).

Results: 442 drug signatures were identified. Four drugs (resveratrol, geldanamycin, rolipram, and tamoxifen) have been previously identified as demonstrating treatment efficacy for peripheral neuropathy, suggesting this approach is effective in identifying compounds to treat PIPN. Of the identified 135 KEGG enriched pathways, 120 (e.g., Focal adhesion, HIF-1 signaling, Cellular senescence) were also identified in a rat preclinical model of acute PIPN and 36 (e.g., mitophagy, HIF-1 signaling, ferroptosis, axon guidance) were previously identified by our research team as potential targets in a series of studies of 50 breast cancer survivors.

Conclusions: These findings suggest this approach may provide a productive alternative approach to drug discovery to treat this severe adverse side effect of neurotoxic chemotherapy.

https://f1000research.com/posters/13-1467

pipn_rges_psb_poster_20241129_online_1240.pdf