speaker: Dr. Ulrich Ruede (FAU Erlangen-Nürnberg and CERFACS Toulouse)
date: 28/01/2021, 2 pm
This webinar will focus on parallel matrix-free multigrid for extreme scale computing.
Multigrid is one of the most efficient algorithms to solve linear systems for elliptic partial differential equations, and matrix-free variants are essential to reach the best possible performance. This will be demonstrated for positive definite systems as they arise in the discretization of the gyrokinetic Poisson equation, as well as indefinite systems that originate in viscous flow problems.
During this webinar, special attention will be given to the coarse grids of the multigrid hierarchy to avoid that they become a sequential bottleneck. Modern sparse direct methods and their approximate form using block-low-rank approximations will be used.
The talk will include a scalability study, aiming to solve a linear system with more than ten trillion unknowns, equivalent to a solution vector as big as 80 TByte in main memory.
Dr. Ulrich Ruede (FAU Erlangen-Nürnberg and CERFACS, Toulouse) will host this webinar, organized by the European Energy-Oriented Center of Excellence (EoCoE). 
The webinar is free and open to everyone, and will be recorded to be later available on the EoCoE YouTube channel.
speaker: - Christie L. Alappat, Erlangen Regional Computing Center (RRZE), PhD student in the group of Prof. G. Wellein - Dr. Georg Hager, Erlangen Regional Computing Center (RRZE), Senior researcher in the HPC division at RRZE
date: 18/11/2020, 10 am
abstract: The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, the Erlangen Regional Computing Center (RRZE) team will detail how they construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. They will describe how the machine model points to peculiarities in the microarchitecture to keep in mind when optimizing applications, and how, applying the ECM model to sparse matrix-vector multiplication (SpMV), they motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format can achieve bandwidth saturation for SpMV. In this context, they will also look into some code optimization strategies that are relevant for A64FX and compare SpMV performance with AMD Rome, Intel Cascade Lake and NVIDIA V100. This webinar, organized by the European Energy-Oriented Center of Excellence (EoCoE), will be hosted by Christie L. Alappat, PhD student at the RRZE, and Dr. Georg Hager, senior researcher in the HPC division at RRZE. The webinar is free and open to everyone, and will be recorded to be later available on the EoCoE YouTube channel.
speaker: Jose Alberto Fonseca Castillo, postdoctoral researcher at CEA / Maison de la Simulation
date: 01/07/2020, 11 am
abstract: The software library ParFlow is a complex parallel code that is used extensively for high performance computing, specifically for the simulation of surface and subsurface flow. The code discretizes the corresponding partial differential equations using cell centered finite differences on a uniform hexahedral mesh. Even with the current supercomputing resources, using uniform meshes may translate in prohibitively expensive computations for certain simulations. A solution to this problem is to employ adaptive mesh refinement (AMR) to enforce a higher mesh resolution only whenever it is required. To this this end, we have relegated ParFlow's mesh management to the parallel AMR library p4est. During this seminar, Jose Fonseca, postdoc researcher at CEA / Maison de la Simulation, will present the algorithmic approach used to perform this coupling and our latest efforts to generalize ParFlow's native discretization to the locally refined meshes obtained with p4est.
speaker: Jaro Hokkanen, Computer Scientist at Forschungszentrum Jülich
date: 10/06/2020, 10.30 am
abstract: Hosted by Jaro Hokkanen, computer scientist at Forschungszentrum Jülich, this webinar will address the GPU implementation of the Parflow code. ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase provides an embedded Domain-Specific Language (eDSL) for generic numerical implementations with support for supercomputer environments (distributed memory parallelism), on top of which the hydrologic numerical core has been built. In ParFlow, the newly developed optional GPU acceleration is built directly into the eDSL headers such that, ideally, parallelizing all loops in a single source file requires only a new header file. This is possible because the eDSL API is used for looping, allocating memory, and accessing data structures. The decision to embed GPU acceleration directly into the eDSL layer resulted in a highly productive and minimally invasive implementation. This eDSL implementation is based on C host language and the support for GPU acceleration is based on CUDA C++. CUDA C++ has been under intense development during the past years, and features such as Unified Memory and host-device lambdas were extensively leveraged in the ParFlow implementation in order to maximize productivity. Efficient intra- and inter-node data transfer between GPUs rests on a CUDA-aware MPI library and application side GPU-based data packing routines. The current, moderately optimized ParFlow GPU version runs a representative model up to 20 times faster on a node with 2 Intel Skylake processors and 4 NVIDIA V100 GPUs compared to the original version of ParFlow, where the GPUs are not used. The eDSL approach and ParFlow GPU implementation may serve as a blueprint to tackle the challenges of heterogeneous HPC hardware architectures on the path to exascale.
speaker: Prof. Dr. Eric Sonnendrücker, Head of Numerical Methods in Plasma Physics Division at the Max Planck Institute for Plasma Physics
date: 15/05/2020, 10.30 am
abstract: The principle behind magnetic fusion research is to confine a plasma, which is a gas of charged particles at a very large temperature, around 100 000 degrees, so that the fusion reaction can generate energy with a positive balance. At such a high temperature, the plasma needs to be completely isolated from the wall of the reactor. This isolation can be achieved in toroidal devices thanks to a very large magnetic field. Due to the multiple and complex physical processes involved, theoretical research in this field relies heavily on numerical simulations and some problems require huge computational resources. After introducing the context of magnetic confinement fusion, we shall address different specific challenges for numerical simulations in this topic, which are in particular related to the multiple space and time scales that need to be spanned and to the geometry of the experimental devices. These can only be solved thanks to a close collaboration between physicists, mathematicians and HPC specialists. A few current research problems in this field going from the computation of a 3D equilibrium to fluid and kinetic simulations will be presented as an illustration.
speaker: Leonardo Bautista Gomez, Senior researcher at BSC, and Kai Keller, Software engineer at BSC
date: April 1st, 2020, 11 AM
abstract: Large scale infrastructures for distributed and parallel computing offer thousands of computing nodes to their users to satisfy their computing needs. As the need for massively parallel computing increases in industry and development, cloud infrastructures and computing centers are being forced to increase in size and to transition to new computing technologies. While the advantage for the users is clear, such evolution imposes significant challenges, such as energy consumption and fault tolerance. Fault tolerance is even more critical in infrastructures built on commodity hardware. Recent works have shown that large scale machines built with commodity hardware experience more failures than previously thought. In this webinar, Leonardo Bautista Gomez and Kai Keller, respectively Senior Researcher and Software Engineer at the Barcelona Supercomputing Center, will focus on how to guarantee high reliability to high-performance applications running in large infrastructures. In particular, they will cover all the technical content necessary to implement scalable multilevel checkpointing for tightly coupled applications. This will include an overview of the internals of the FTI library, and explain how multilevel checkpointing is implemented today, together with examples that the audience can test and analyze on their own laptops, so that they learn how to use FTI in practice, and ultimately transfer that knowledge to their production systems.
speaker: Herbert Owen, senior researcher at Barcelona Supercomputing Center
date: April 20th, 2020, 11.00 – 12.15 AM, Monday,
abstract: Wind resource assessment is performed before deciding to construct a new wind farm. Its objective is to obtain a better understanding of the flow over the potential site for a wind farm. It combines experimental data from a couple of wind mast that are the energy the wind farm can produce and enables to determine the optimal positions for the wind turbines. Currently, state of the art in the industry is to resort to Reynolds Averaged Navier Stokes (RANS) turbulence models for such simulations. While RANS models are computationally affordable and quite robust, they are known to have accuracy limitations for regions where separated flows are found, behind mountains for instance. Large Eddy Simulation (LES) can provide much more accurate results, at a much higher computational cost. With the advent of exascale computers, LES has become a viable alternative. In this talk, we will present the work we have been doing for Iberdrola during the last six years and the steps we are taking within EoCoE to push the limits towards much higher accuracies thanks to the efficient use of exascale resources.
speaker: Julien Bigot, Senior researcher at CEA
date: March 6th, 2020, 10:00 AM
abstract: Julien Bigot, tenured computer science researcher at CEA, will present the PDI data interface, a declarative API to decouple application codes from the Input / Output strategy to use. He will present its plugin system, which supports the selection of the best-suited existing IO library through a configuration file in each part of the code depending on the hardware available, the IO pattern, the problem size, etc. This webinar will demonstrate the advantage of this approach in term of software engineering and performance through the example of the Gysela5D code.
speaker: Pasqua D'Ambra, Senior Research Scientist at the National Research Council of Italy
date: 24/02/2020, 3PM
abstract: Current applications in Computational and Data Science often require the solution of large and sparse linear systems. The notion of "large" is qualitative and there is a clear tendency to increase it; currently, needing to solve systems with millions or even billions of unknowns is not unusual. To efficiently solve the above systems on high-end massively parallel computers, the methods of choice are the Krylov methods, whose convergence and scalability properties are related to the choice of suitable preconditioning techniques. During this webinar, Pasqua D'Ambra, Senior Research Scientist at the National Research Council of Italy, will present MLD2P4 (MultiLevel Domain Decomposition Parallel Preconditioners Package based on PSBLAS), which provides efficient and easy-to-use preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms) computational framework. The package, whose features are constantly updated within the Energy-Oriented Center of Excellence (EoCoE) European project, includes multilevel cycles and smoothers widely used in multigrid methods. A purely algebraic approach is applied to generate coarse-level corrections so that no geometric background is needed concerning the matrix to be preconditioned. We will present the main features of the package, and example of usage of the main APIs needed to setup the preconditioner, together with its application within the Krylov solvers available from PSBLAS. Some results on test cases relative to the EoCoE application areas highlight how the PSBLAS/MLD2P4 software framework can be used to obtain highly scalable linear solvers. The PSBLAS library is available at https://github.com/sfilippone/psblas3 MLD2P4 is available at https://github.com/sfilippone/mld2p4-2
speaker: Fabio Durastante (CNR, Naples, Italy)
date: January 20th, 2020
abstract: This tutorial will address the basic functionalities of the PSBLAS library for the parallelization of computationally intensive scientific applications. We will delve into the parallel implementation of iterative solvers for sparse linear systems in a distributed memory paradigm, and look at the routines for multiplying sparse matrices by dense matrices, solving block diagonal systems with triangular diagonal entries, preprocessing sparse matrices, and several additional routines for dense matrix operations. We will discuss both the direct usage of the library in Fortran2003 and the usage of the C interfaces. The tutorial will include examples relative to the EoCoE-II application areas, and highlight how the PSBLAS environment can be used to obtain scalable parallel codes.