The A64FX processor: Understanding streaming kernels and sparse matrix-vector multiplication

Date: 18/11/2020, 10 am
speaker: - Christie L. Alappat, Erlangen Regional Computing Center (RRZE), PhD student in the group of Prof. G. Wellein - Dr. Georg Hager, Erlangen Regional Computing Center (RRZE), Senior researcher in the HPC division at RRZE
abstract: The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, the Erlangen Regional Computing Center (RRZE) team will detail how they construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. They will describe how the machine model points to peculiarities in the microarchitecture to keep in mind when optimizing applications, and how, applying the ECM model to sparse matrix-vector multiplication (SpMV), they motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format can achieve bandwidth saturation for SpMV. In this context, they will also look into some code optimization strategies that are relevant for A64FX and compare SpMV performance with AMD Rome, Intel Cascade Lake and NVIDIA V100. This webinar, organized by the European Energy-Oriented Center of Excellence (EoCoE), will be hosted by Christie L. Alappat, PhD student at the RRZE, and Dr. Georg Hager, senior researcher in the HPC division at RRZE. The webinar is free and open to everyone, and will be recorded to be later available on the EoCoE YouTube channel.