Home > News > Ensemble-based Data Assimilation for Large Scale Simulations

Ensemble-based Data Assimilation for Large Scale Simulations

Friday 1st of July 2022, 14:00 CEST 

INRIA – University Grenoble Alpes

Sebastian Friedemann Thesis Defense

I have the pleasure to invite you to the defense of my PhD thesis on Friday 1st of July 2022 at 14:00.

The defense will be held in the auditorium of the IMAG building (700 avenue Centrale – Domaine Universitaire – 38401 St Martin d’Hères – France). The auditorium is open to the public. After the defense, there will be a buffet too.   If you cannot make it in person but still want to attend, there will be the possibility to follow the live stream of the defense online: https://univ-grenoble-alpes-fr.zoom.us/j/5241916545


Prediction of chaotic and non-linear systems like weather or the groundwater cycle relies on a floating fusion of sensor data (observations) with numerical models to decide on good system trajectories and to compensate for non-linear feedback effects. Ensemble-based data assimilation (DA) is a major method for this concern. It relies on the propagation of an ensemble of perturbed model realizations (members) that is enriched by the integration of observation data. Performing DA at large scale to capture continental up to global geospatial effects, while running at high resolution to accurately predict impacts from small scales is computationally demanding. This requires supercomputers leveraging hundreds of thousands of compute nodes, interconnected via high-speed networks. Efficiently scaling DA algorithms to such machines requires carefully designed highly parallelized workflows that avoid overloading of shared resources. Fault tolerance is of importance too, since the probability of hardware and numerical faults increases with the amount of resources and the number of ensemble members. Existing DA frameworks either use the file system as intermediate storage to provide a fault-tolerant and elastic workflow, which, at large scale, is slowed down by file system overload, or run large monolithic jobs that suffer from intrinsic load imbalance and are very sensible to numerical and hardware faults. This thesis elaborates on a highly parallel, load-balanced, elastic, and fault-tolerant solution, enabling it to run efficiently statistical, ensemble-based DA at large scale. We investigate two classes of DA algorithms, the en- semble Kalman filter (EnKF), and the particle filter algorithm with sequential importance resampling (SIR), and validate our framework under realistic conditions. Groundwater sensor data is assimilated using a regional hydrological simulation leveraging the ParFlow model. We efficiently run EnKF with up to 16,384 members on 16,240 compute cores for this purpose. A comparison with an existing state-of-the-art solution on the same domain, running 2,500 members on 20,000 cores, shows that our approach is about 50 % faster. We also present performance improvements running particle filter with SIR at large scale. These experiments assimilate cloud coverage observations into 2,555 members, i.e., particles, running the weather research and forecasting (WRF) model over the European domain. To manage the many experiments performed on various supercomputers, we developed a specific setup that we also present. Keywords: Data Assimilation, Ensemble Based, In Situ Processing, EnKF, Particle Filter, High Performance Computing