Proceedings

2020

P. Karnakov, F. Wermelinger, S. Litvinov, and P. Koumoutsakos, “Aphros: High Performance Software for Multiphase Flows with Large Scale Bubble and Drop Clusters,” in PASC '20: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '20, 2020, pp. 1-10. Publisher's VersionAbstract
We present the high performance implementation of a new algorithm for simulating multiphase flows with bubbles and drops that do not coalesce. The algorithm is more efficient than the standard multi-marker volume-of-fluid method since the number of required fields does not depend on the number of bubbles. The capabilities of our methods are demonstrated on simulations of a foaming waterfall where we analyze the effects of coalescence prevention on the bubble size distribution and show how rising bubbles cluster up as foam on the water surface. Our open-source implementation enables high throughput simulations of multiphase flow, supports distributed as well as hybrid execution modes and scales efficiently on large compute systems.
D. Wälchli, et al., “Load Balancing in Large Scale Bayesian Inference,” in PASC '20: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '20, 2020, pp. 1-12. Publisher's VersionAbstract

We present a novel strategy to improve load balancing for large scale Bayesian inference problems. Load imbalance can be particularly destructive in generation based uncertainty quantification (UQ) methods since all compute nodes in a large-scale allocation have to synchronize after every generation and therefore remain in an idle state until the longest model evaluation finishes. Our strategy relies on the concurrent scheduling of independent Bayesian inference experiments while sharing a group of worker nodes, reducing the destructive effects of workload imbalance in population-based sampling methods.

To demonstrate the efficiency of our method, we infer parameters of a red blood cell (RBC) model. We perform a data-driven calibration of the RBC's membrane viscosity by applying hierarchical Bayesian inference methods. To this end, we employ a computational model to simulate the relaxation of an initially stretched RBC towards its equilibrium state. The results of this work advance upon the current state of the art towards realistic blood flow simulations by providing inferred parameters for the RBC membrane viscosity.

We show that our strategy achieves a notable reduction in imbalance and significantly improves effective node usage on 512 nodes of the CSCS Piz Daint supercomputer. Our results show that, by enabling multiple independent sampling experiments to run concurrently on a given allocation of supercomputer nodes, our method sustains a high computational efficiency on a large-scale supercomputing setting.

2019

P. Karnakov, S. Litvinov, J. M. Favre, and P. Koumoutsakos, “Breaking waves: to foam or not to foam?” in 72nd Annual Meeting of the APS Division of Fluid Dynamics - Gallery of Fluid Motion Award Winner, 2019.
P. Karnakov, F. Wermelinger, M. Chatzimanolakis, S. Litvinov, and P. Koumoutsakos, “A High Performance Computing Framework for Multiphase, Turbulent Flows on Structured Grids,” in PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '19, 2019, pp. 1-9. Publisher's VersionAbstract
We present a high performance computing framework for mul- tiphase, turbulent flows on structured grids. The computational methods are validated on a number of benchmark problems such as the Taylor-Green vortex that are extended by the inclusion of bubbles in the flow field. We examine the effect of bubbles on the turbulent kinetic energy dissipation rate and provide extensive data for bubble trajectories and velocities that may assist the develop- ment of engineering models. The implementation of the present solver on massively parallel, GPU enhanced architectures allows for large scale and high throughput simulations of multiphase flows.
G. Arampatzis, D. Wälchli, P. Weber, H. Rästas, and P. Koumoutsakos, “(Μ,Łambda)-CCMA-ES for Constrained Optimization with an Application in Pharmacodynamics,” in PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '19, 2019, pp. 1-9. Publisher's VersionAbstract
We present the algorithm CCMA-ES, an extension to CMA-ES, an evolution strategy that has shown to perform well in a broad range of black-box optimization problems. The (µ, λ)-CMA-ES effectively handles nonlinear nonconvex functions but faces difficulties in constrained optimization problems. We introduce viability boundaries to improve the search for an initial point in the valid domain and adapt the covariance matrix using normal approximations to maintain the inequality constraints. Using benchmark problems from 2006 CEC we compare the performance of CCMA-ES with a state of the art optimization algorithm (mViE) showing favorable results. Finally, CCMA-ES is applied to a pharmacodynamics problem describing tumor growth, and we demonstrate that CCMA-ES outperforms mViE in terms of the objective function value and total function evaluations.
G. Novati and P. Koumoutsakos, “Remember and Forget for Experience Replay,” Proceedings of the 36th International Conference on Machine Learning, vol. 97. pp. 4851-4860, 2019. Publisher's VersionAbstract
Proceedings of the 36th International Conference on Machine Learning Experience replay (ER) is a fundamental component of off-policy deep reinforcement learning (RL). ER recalls experiences from past iterations to compute gradient estimates for the current policy, increasing data-efficiency. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors and can undermine the performance of ER. Many algorithms mitigate this issue by tuning hyper-parameters to slow down policy changes. An alternative is to actively enforce the similarity between policy and the experiences in the replay memory. We introduce Remember and Forget Experience Replay (ReF-ER), a novel method that can enhance RL algorithms with parameterized policies. ReF-ER (1) skips gradients computed from experiences that are too unlikely with the current policy and (2) regulates policy changes within a trust region of the replayed behaviors. We couple ReF-ER with Q-learning, deterministic policy gradient and off-policy gradient methods. We find that ReF-ER consistently improves the performance of continuous-action, off-policy RL on fully observable benchmarks and partially observable flow control problems.

2017

G. - H. Cottet and P. Koumoutsakos, “High Order Semi-Lagrangian Particle Methods,” in Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2016, M. Bittencourt, N. Dumont, and J. S. Hesthaven, Ed. Springer, 2017, pp. 103-117. Publisher's VersionAbstract
Semi-Lagrangian (or remeshed) particle methods are conservative particle methods where the particles are remeshed at each time-step. The numerical analysis of these methods show that their accuracy is governed by the regularity and moment properties of the remeshing kernel and that their stability is guaranteed by a lagrangian condition which does not rely on the grid size. Turbulent transport and more generally advection dominated flows are applications where these features make them appealing tools. The adaptivity of the method and its ability to capture fine scales at minimal cost can be further reinforced by remeshing particles on adapted grids, in particular through wavelet-based multi-resolution analysis
S. Verma, G. Novati, F. Noca, and P. Koumoutsakos, “Fast Motion of Heaving Airfoils,” in Procedia Computer Science – ICCS 2017, 2017, vol. 108, pp. 235–244. Publisher's VersionAbstract
Heaving airfoils can provide invaluable physical insight regarding the flapping flight of birds and insects. We examine the thrust-generation mechanism of oscillating foils, by coupling two-dimensional simulations with multi-objective optimization algorithms. We show that the majority of the thrust originates from the creation of low pressure regions near the leading edge of the airfoil. We optimize the motion of symmetric airfoils exploiting the Knoller-Betz-Katzmayr effect, to attain high speed and lower energy expenditure. The results of the optimization indicate an inverse correlation between energy-efficiency, and the heaving-frequency and amplitude for a purely-heaving airfoil.
S. Verma, P. Hadjidoukas, P. Wirth, and P. Koumoutsakos, “Multi-objective optimization of artificial swimmers,” in 2017 IEEE Congress on Evolutionary Computation (CEC), 2017, pp. 1037–1046. Publisher's VersionAbstract
A fundamental understanding of how various biological traits and features provide organisms with a com- petitive advantage can help us improve the design of several mechanical systems. Numerical optimization can be invaluable for this purpose, by allowing us to scrutinize the evolution of specific biological adaptations. Importantly, the use of numeri- cal optimization can help us overcome limiting constraints that restrict the evolutionary capability of biological species. Thus, we couple high-fidelity simulations of self-propelled swimmers with evolutionary optimization algorithms, to examine peculiar swimming patterns observed in a number of fish species. More specifically, we investigate the intermittent form of locomotion referred to as ‘burst-and-coast’ swimming, which involves a few quick flicks of the fish’s tail followed by a prolonged unpowered glide. This mode of swimming is believed to confer energetic benefits, in addition to several other advantages. We discover a range of intermittent-swimming patterns, the most efficient of which resembles the swimming-behaviour observed in live fish. We also discover patterns which lead to a marked increase in swimming-speed, albeit with a significant increase in energy expenditure. Notably, the use of multi- objective optimization reveals locomotion patterns that strike the perfect balance between speed and efficiency, which can be invaluable for use in robotic applications. The analyses presented may also be extended for optimal design and control of airborne vehicles. As an additional goal of the paper, we highlight the ease with which disparate codes can be coupled via the software framework used, without encumbering the user with the details of efficient parallelization.
A. Economides, et al., “Towards the Virtual Rheometer,” in Proceedings of the Platform for Advanced Scientific Computing - PASC \textquotesingle17, 2017. Publisher's VersionAbstract
Recent advances in medical research and bio-engineering have led to the development of devices capable of handling fluids and biological matter at the microscale. The operating conditions of medical devices are constrained to ensure that characteristic properties of blood flow, such as mechanical properties and local hemodynamics, are not altered during operation. These properties are a consequence of the red blood cell (RBC) microstructure, which changes dynamically according to the device geometry. The understanding of the mechanics and dynamics that govern the interactions between the RBCs is crucial for the quantitative characterization of blood flow, a stepping stone towards the design of medical devices specialized to the patient, in the context of personalized medicine. This can be achieved by analyzing the microstructural characteristics of the RBCs and study their dynamics. In this work we focus on the quantification of the microstructure of high and low hematocrit blood flows, in wall bounded geometries. We present distributions of the RBCs according to selected deformation criteria and dynamic characteristics, and elaborate on mechanisms that control their collective behavior, focusing on the interplay between cells and shear induced effects.

2016

P. E. Hadjidoukas, et al., “High throughput simulations of two-phase flows on Blue Gene/Q,” in Parallel Computing: On the Road to Exascale – ParCo 2015, 2016, vol. 27, pp. 767–776. Publisher's VersionAbstract
CUBISM-MPCF is a high throughput software for two-phase flow simu- lations that has demonstrated unprecedented performance in terms of floating point operations, memory traffic and storage. The software has been optimized to take advantage of the features of the IBM Blue Gene/Q (BGQ) platform to simulate cav- itation collapse dynamics using up to 13 Trillion computational elements. The per- formance of the software has been shown to reach an unprecedented 14.4 PFLOP/s on 1.6 Million cores corresponding to 72% of the peak on the 20 PFLOP/s Se- quoia supercomputer. It is important to note that, to the best of our knowledge, no flow simulations have ever been reported exceeding 1 Trillion elements and reach- ing more than 1 PFLOP/s or more than 15% of peak. In this work, we first ex- tend CUBISM-MPCF with a more accurate numerical flux and then summarize and evaluate the most important software optimization techniques that allowed us to reach 72% of the theoretical peak performance on BGQ systems. Finally, we show recent simulation results from cloud cavitation comprising 50000 vapor bubbles.
F. Wermelinger, B. Hejazialhosseini, P. Hadjidoukas, D. Rossinelli, and P. Koumoutsakos, “An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures,” in Proceedings of the Platform for Advanced Scientific Computing - PASC \textquotesingle16, 2016. Publisher's VersionAbstract
We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2× the multicore- based Gordon Bell winning cubism-mpcf solver [16] for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100× stronger than the strength of the initial shock.

2015

D. Rossinelli, et al., “The In-Silico Lab-on-a-Chip: Petascale and High-Throughput Simulations of Microfluidics at Cell Resolution,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis – SC \textquotesingle15, 2015, no. Article 2. Publisher's VersionAbstract
We present simulations of blood and cancer cell separation in complex microfluidic channels with subcellular resolu- tion, demonstrating unprecedented time to solution, per- forming at 65.5% of the available 39.4 PetaInstructions/s in the 18, 688 nodes of the Titan supercomputer. These simulations outperform by one to three orders of magnitude the current state of the art in terms of numbers of simulated cells and computational elements. The com- putational setup emulates the conditions and the geometric complexity of microfluidic experiments and our results re- produce the experimental findings. These simulations pro- vide sub-micron resolution while accessing time scales rele- vant to engineering designs. We demonstrate an improvement of up to 45X over com- peting state-of-the-art solvers, thus establishing the frontiers of simulations by particle based methods. Our simulations redefine the role of computational science for the develop- ment of microfluidics – a technology that is becoming as important to medicine as integrated circuits have been to computers
P. E. Hadjidoukas, D. Rossinelli, B. Hejazialhosseini, and P. Koumoutsakos, “From 11 to 14.4 PFLOPs: Performance Optimization for Finite Volume Flow Solver,” in Proceedings of the 3rd International Conference on Exascale Applications and Software – EASC '15, 2015, pp. 7–12. Publisher's VersionAbstract
CUBISM-MPCF is a compressible, two-phase flow solver that has performed unprecedented flow simulations, employ- ing 13 trillion computational elements to study cavitation collapse of a cloud composed of 15’000 bubbles. The code had been deployed on 1.6 million cores of the Sequoia IBM BlueGene/Q supercomputer, reaching initially 11 PFLOPs, corresponding to 55% of its nominal peak performance. This paper reports, for the first time, the techniques used to extend the performance of the code by 30% reaching 14.4 PFLOPs on BlueGene/Q systems. The achieved 72% of the peak performance constitutes to date the best performance for flow simulations in supercomputer architectures. Our techniques take advantage of the underlying hardware capabilities and were applied through all levels in the soft- ware abstraction aiming at full exploitation of the inherent instruction/data-,thread- and cluster-level parallelism. The software advances by two to three orders of magnitude the state-of-the-art both in terms of time to solution and geo- metric complexity of the flow. We believe that the present methods are relevant to all grid based solvers and as such they may serve to enhance the capabilities across different areas of simulation based science.

2013

D. Rossinelli, et al., “11 PFLOP/s simulations of cloud cavitation collapse,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC \textquotesingle13, 2013. Publisher's VersionAbstract
We present unprecedented, high throughput simulations of cloud cavitation collapse on 1.6 million cores of Sequoia reaching 55% of its nominal peak performance, correspond- ing to 11 PFLOP/s. The destructive power of cavitation re- duces the lifetime of energy critical systems such as internal combustion engines and hydraulic turbines, yet it has been harnessed for water purification and kidney lithotripsy. The present two-phase flow simulations enable the quantitative prediction of cavitation using 13 trillion grid points to re- solve the collapse of 15’000 bubbles. We advance by one or- der of magnitude the current state-of-the-art in terms of time to solution, and by two orders the geometrical complexity of the flow. The software successfully addresses the challenges that hinder the effective solution of complex flows on con- temporary supercomputers, such as limited memory band- width, I/O bandwidth and storage capacity. The present work redefines the frontier of high performance computing for fluid dynamics simulations.

2011

P. Chatelain, M. Gazzola, S. Kern, and P. Koumoutsakos, “Optimization of Aircraft Wake Alleviation Schemes through an Evolution Strategy,” in High Performance Computing for Computational Science - VECPAR 2010, Springer, 2011, pp. 210–221. Publisher's VersionAbstract
We investigate schemes to accelerate the decay of aircraft trailing vortices. These structures are susceptible to several instabilities that lead to their eventual destruction. We employ an Evolution Strategy to design a lift distribution and a lift perturbation scheme that minimize the wake hazard as proposed in [6]. The performance of a scheme is mea- sured as the reduction of the mean rolling moment that would be induced on a following aircraft; it is computed by means of a Direct Numerical Simulation using a parallel vortex particle code. We find a configuration and a perturbation scheme characterized by an intermediate wavelength łambda} \sim 4.64, necessary to trigger medium wavelength instabilities between tail and flap vortices and subsequently amplify long wavelength modes.