Motivation
Analyses of big biomedical datasets must not only account for the heterogeneity, multidimensionality, noisiness and incompleteness of the data itself, but must also simultaneously consider the substantial computational resources required for data processing. Indeed, the data-intensive nature of problems in the biomedical informatics field warrants the development and use of novel, well-designed algorithms as well as massive computer infrastructure and advanced software tools, including those deployed in the cloud. In this session, we will address issues related to the optimization of tool development for large-scale datasets, such as compute time, storage, and the need for parallelization, with a focus on computational pattern recognition methods in particular. We are especially interested in innovative approaches to identify and overcome challenges associated with utilizing various types of biomedical data, including but not limited to electronic health records, medical images, genetic sequences, various 'omics data, and others. Finally, our session will also focus on the difficulties arising from integrating diverse biomedical data — such as unprocessed textual data, multi-omics data, cross-species data, cross-institutional data or summary-level statistics for instance — in order to accurately identify patterns across biomedical datasets.