Load Balancing in Large Scale Bayesian Inference

Citation:

D. Wälchli, et al., “Load Balancing in Large Scale Bayesian Inference,” in PASC '20: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '20, 2020, pp. 1-12.

Date Presented:

jun

Abstract:

We present a novel strategy to improve load balancing for large scale Bayesian inference problems. Load imbalance can be particularly destructive in generation based uncertainty quantification (UQ) methods since all compute nodes in a large-scale allocation have to synchronize after every generation and therefore remain in an idle state until the longest model evaluation finishes. Our strategy relies on the concurrent scheduling of independent Bayesian inference experiments while sharing a group of worker nodes, reducing the destructive effects of workload imbalance in population-based sampling methods.

To demonstrate the efficiency of our method, we infer parameters of a red blood cell (RBC) model. We perform a data-driven calibration of the RBC's membrane viscosity by applying hierarchical Bayesian inference methods. To this end, we employ a computational model to simulate the relaxation of an initially stretched RBC towards its equilibrium state. The results of this work advance upon the current state of the art towards realistic blood flow simulations by providing inferred parameters for the RBC membrane viscosity.

We show that our strategy achieves a notable reduction in imbalance and significantly improves effective node usage on 512 nodes of the CSCS Piz Daint supercomputer. Our results show that, by enabling multiple independent sampling experiments to run concurrently on a given allocation of supercomputer nodes, our method sustains a high computational efficiency on a large-scale supercomputing setting.

Publisher's Version

Last updated on 09/17/2021