The emergence of deep memory hierarchies and the need for extreme scalability are key issues applications must face for their arrival in the coming pre-exascale and exascale systems.
EoCoE-II proposes a flexible approach that embraces Input/Output (IO) methods, ensemble runs and programming models in order to contribute to the necessary paradigm shift.
From a software engineering point of view, the current implementation of physics diagnostics and post-processing in production codes is mixed with the numerical core implementation. The resulting codebase is then monolithic and intricate. The development and the introduction of a Parallel Data Interface in production codes separate clearly these two concerns and will improve its modularity, its maintainability and its adaptability to new coming hardware. The integration of the Parallel Data Interface with the code enables for different associated plug-ins to be implemented dealing with general IO issues, in-situ/in-transit post-processing capabilities or fault tolerance mechanisms.
EoCoE-II project partners offer expertise in the following domains:
- Experts on Programming Models focus on how to handle efficiently complex computing nodes having a deep memory hierarchy and possibly accelerators, how to address more operation concurrencies and to minimize development effort that maximizes performance portability. There will not be new research in this area but the application of existing developments and of established standards or emerging technologies, like task-based programming models and Kokkos.
- Experts on Scalable Solvers focus on algorithmic issues that are strongly linked to linear algebra. Refactoring existing solver packages will not be enough to reach Exascale. New algorithms are necessary to reach the required level of scalability. The range of the expertise is wide and will be applied to at least seven codes where linear algebra has been identified as one of the main bottlenecks.
- Experts in Input/Output & Data Flow methods aim to leverage efficiently the capacities of the coming hardware that will further extend the depth of memory hierarchy (NVRAM, SSD…). Efficiently writing large amounts of data to the parallel file system will be the expertise on offer. Deep memory hierarchies call for paradigm shift in the way fault tolerance is handled and post processing is implemented with in-situ and in-transit capabilities.
- Experts in Ensemble Runs propose the integration of software technologies to run efficiently simulation ensembles and more generally workflows on coming pre-exascale and exascale systems. This will potentially include data assimilation. The objective is to provide a flexible and maintainable way of executing simulation ensembles in multiple jobs on a given supercomputer while enabling communication between distinct ensemble members in ensemble management and data assimilation processes.