The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, the Erlangen Regional Computing Center (RRZE) team detailed how they construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. They described how the machine model points to peculiarities in the microarchitecture to keep in mind when optimizing applications, and how, applying the ECM model to sparse matrix-vector multiplication (SpMV), they motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format can achieve bandwidth saturation for SpMV. In this context, they also looked into some code optimization strategies that are relevant for A64FX and compare SpMV performance with AMD Rome, Intel Cascade Lake and NVIDIA V100. This webinar, organized the 18th of November 2020 by the European Energy-Oriented Center of Excellence (EoCoE), had been hosted by Christie L. Alappat, PhD student at the RRZE, and Dr. Georg Hager, senior researcher in the HPC division at RRZE.
Home > News > Webinar on the A64FX processor: Understanding streaming kernels and sparse matrix-vector multiplication