PSB 2025 - Command Line to PipeLine: Cross-Biobank Analyses with Nextflow

Workshop Outline

Workshop Format: This interactive workshop combines presentations, demonstrations, hands-on tutorials, and discussions with our bioinformatics experts.

Informative Presentations: Gain a solid foundation in biobank data analysis and Nextflow functionalities.
Demonstrations: See real-world examples of configuring and running large-scale pipelines.
Hands-on Tutorials: Develop your skills by building your own Nextflow workflows under expert guidance.
Interactive Exercises: Practice your newfound skills individually and collaboratively, with on-demand support from our team.
Discussions: Encourage continuing discussion for best practices for cross-biobank analyses with an emphasis on reproducibility, scalability, and security/privacy.

Introduction to Cloud-based Platforms and Pipeline Managers

Discussion on the expansion of biobanks and the necessity for cloud-agnostic workflows for advancing genomic studies.
The role of cloud-based platforms and institutional biobanks in advancing genomic studies.
Introduction to pipeline managers like Nextflow, focusing on their role in enabling cross-platform computing, portability, and reproducibility.

Module 1: Genomic Pipelines for Biobanks: Development and Deployment

A detailed guide on how to deploy analysis pipelines across different computing infrastructures, including high-performance computing and cloud-based platforms (DNAnexus, for UKB analysis, and the All of Us Researcher Workbench).
Demonstration on how to utilize and understand genome-wide association study (GWAS) and polygenic score (PGS) pipelines built by our team

Module 2: Developing Your Own Workflows

Introduction to cloud-agnostic workflow languages with a focus on demystifying Nextflow pipeline management concepts.
Hands-on Tutorial: A gentle introduction for intermediate command line users to start their own workflow development.
Individual Exercise: Turning your local pipeline into a deployable Nextflow workflow.

Module 3: Overcoming Limitations of Working Across Biobanks & Cloud Platforms

Present resources for overcoming common hurdles, including a compilation of materials on our GitHub repository and strategies for interdisciplinary collaboration.
Group Exercise: Deploying a workflow across cloud environments; Coding collaboratively with Google Cloud Shell.
Discussion: A discussion with our team regarding challenges and best practices for unifying and scaling your pipelines.

Workshop Organizers

	Anurag Verma, PhD, University of Pennsylvania. Anurag is an Assistant Professor in the Department of Medicine at the University of Pennsylvania, and he also serves as Associate Director of Clinical Informatics and Genomics for Penn Medicine BioBank. His research has focused on the study of the genetic basis of complex diseases using big data techniques with the main focus on studying the genetic architecture of multimorbidity, the phenotypic architecture of common genetic risk, polygenic risk scores, and phenome-wide association studies to identify the complex phenotypic and genomic interactions that lead to complex disease. In his capacity at PMBB, Anurag leads a team called CodeWorks that develops scalable workflows and harnesses both in-house and cloud computing resources for advancements in genetic research. His team's efforts are in expanding the boundaries of how data informatics can be applied to keep pace with the rapidly changing landscape of large-scale biobanks.
	Lindsay Guare, University of Pennsylvania. Lindsay is a second-year PhD student in the Genomics and Computational Biology Program at UPenn with a focus in Biomedical Informatics. She has been involved in many large-scale genetic association study collaborations, but her research will be focused on leveraging innovative computational data science approaches to explore clinical and genetic heterogeneity in endometriosis. Her interdisciplinary background includes computer science, contributing to her leadership in CodeWorks.
	Katie Cardone, University of Pennsylvania. Katie is a Research Specialist in the Department of Genetics at the University of Pennsylvania, and is a Graduate Student in the University of Pennsylvania’s Master of Biomedical Informatics Program. In her role, Katie executes a wide range of bioinformatic analyses, including genome-wide association studies, phenome-wide association studies, exome-wide rare variant association studies, and polygenic scores on large biobanks, including the Penn Medicine BioBank, the eMERGE network, and the All of Us research program. She also develops Nextflow pipelines for polygenic score tools.
	Christopher Carson, MS, University of Pennsylvania. Chris is a Bioinformatician at the University of Pennsylvania Institute for Biomedical Informatics. His role in the Verma lab covers an extensive range of workflow pipeline development, conducting genetic analysis requests for the Penn Medicine Biobank (PMBB), and producing bioinformatics software for analyzing large-scale genomic and phenomic datasets. He has experience conducting genome-wide, phenome-wide, and exome-wide association studies using the large-scale datasets retained in the PMBB with the use of SAIGE.
	Zachary Rodriguez, PhD, University of Pennsylvania. Zach is a Bioinformatician at the University of Pennsylvania’s Perelman School of Medicine. His research has focused on the study of the genetic basis of complex diseases using big data techniques with the main focus on studying the genetic architecture of multimorbidity, the phenotypic architecture of common genetic risk, polygenic risk scores, and phenome-wide association studies to identify the complex phenotypic and genomic interactions that lead to complex disease. He has informatics expertise in machine learning, natural language processing, and pipeline development, with extensive experience in analyzing large-scale genomic data, electronic health records (EHR), and biobank datasets, including Penn Medicine BioBank.

Command Line to PipeLine: Cross-Biobank Analyses with Nextflow

About the Workshop

The Location is the Icing on the Cake: Beautiful Hawaii!

Learning Objectives

Workshop Outline

Workshop Organizers


Moonless Starry Sky Over Mauna Kea Observatory	Sunny South Kona Beach	Kīlauea's summit caldera, Volcano National Park