This deliverable D1.18 replaces D1.17 Application Perf
The Transversal Basis plays a pivotal role in EoCoE, providing key leverage and know-how in tackling algorithmic bottlenecks and software technologies limitations across all four application domains called the thematic application pillars.
The work is carried out within five thematically grouped tasks designed to offer both long- and short- term remedial measures in the target application areas as follows:
Numerical methods & Applied mathematics
This task focuses on reengineering computationally intensive parts of the application pillar codes so that they are better suited to supercomputers with high concurrency. To do so, tried and trusted mathematical algorithms will be replaced by newer methods with better scaling properties. As the integration of such methods in scientific applications have a strong impact, these activities are considered on long time frames.
- DIOGENeS: - DIscOntinuous GalErkin Nanoscale Solvers, software suite dedicated to the numerical modeling of light interaction with nanometer scale structures with applications to nanophotonics and nanoplasmonics
- Parallel in Time methods have recently been shown to provide a promising way to extend prevailing scaling limits for the numerical solution of time-dependent differential equations
Linear equation systems are an integral part of many of the codes found in the application pillars, so these applications rely on robust, high-performance, portable solvers when run on supercomputers. The work in this task focusses on the core development of advanced linear algebra solvers on emerging HPC architectures and their integration in applications from the pillars. In addition, longer term work on "next generation" linear solvers is planned, such as solvers based in runtime systems, or hybrid methods, as these constitute a promising avenue for efficient exploitation of many-core architectures.
- Parallel sparse direct solvers: very robust solution techniques possibly memory and CPU consuming (MUMPS, PaStiX , QR_mumps)
- Parallel hybrid solvers: possible trade-off between memory and CPU consumption ( ABCD, HPDDM, MaPHyS)
- Algebraic Multigrid preconditioners: lighter memory footprint well suited for elliptip type PDE solution (AGMG, PSBLAS & MLD2P4, HHG)
- Additional packages: Standalone Krylov solvers, parallel Fast multipole package. dense linear algebra
Parallel Input / Output
Performing efficient Inputs/Outputs (IO) for very large datasets on current supercomputers is already challenging and will become more challenging for the next supercomputer generations. On the short term, support is given to all applications from the pillars to improve IO behaviours on current architectures. Middle term and more structural work on parallel-IO and fault tolerance packages will be of immediate benefit to the energy oriented scientific communities. Longer term activities aim at providing best practise on IO implementation in scientific codes and at providing software engineering solutions to improve their maintainability.
- FTI: Fault Tolerance Interface for fast and efficient multilevel checkpointing in large scale supercomputers
- XIOS: Flexible and high throughput framework to manage I/O, data definition and post-processing thanks to dedicated parallel asynchronous I/O server
- SIONlib: writing and reading binary data to/from several thousands of processors into one or a small number of physical files to get most of the parallel file system performance.
- PDI: Parallel Data Interface to decouple high-performance simulation codes from Input/Output concerns
Advanced programming methods and tools for Exascale
Nearly all the major applications in the energy-centred work packages suffer from one or more performance bottlenecks. Moreover, rapid hardware evolution poses a challenge for software to maintain high performance, especially for coupled, multi-scale programs underlying the inter-disciplinary applications. This task centres on application optimization and parallel scaling, software engineering for many-core architectures, and automatic code generation tools. Activities carried out here have direct impacts on the scientific applications developed in the four pillars, i.e. HPC experts working in close collaboration with domain scientists in order to improve application performance on current architectures. These collaborations starts sytematically with a deep and reproducible performance analysis of the applications. In this context the following HPC tools:
- JUBE: Benchmarking Environment to improve performance analysis reproducibility