Home > News > In-situ data processing with DASK: Public Webinar

In-situ data processing with DASK: Public Webinar

2025/03/21Beatrice CalossoNews

27th March 2025, 10 am

In-situ data processing with DASK

Webinar organized by CASTIEL2

Abstract: In situ data processing refers to the online processing of data produced by large-scale parallel simulations to reduce the amount of data saved to disk and avoid the I/O bottleneck that affects large supercomputers. There exist various solutions based on the classical MPI model that can be very efficient. However, today data analytics tools are dominated by Python packages that offer highly productive environments familiar to many users (NumPy, Pandas, scikit-learn, PyTorch, JAX). While these packages often support limited parallelization, frameworks like Dask (https://www.dask.org) and Ray (https://ray.io) have proposed seamless parallelizations to enable processing large datasets based on distributed task-based programming. The task programming paradigm reduces parallelization effort as users express potential parallelism as a graph of functions where functions without data dependencies can be executed in parallel. At execution, a scheduler handles task scheduling on available cluster nodes, considering dependencies and data locality. During this talk we will show how we can combine MPI-based solvers with Dask-based in situ data processing. We will present work done at INRIA and Maison de La Simulation for leveraging Dask to enable in situ data processing from data produced by MPI codes. We will discuss different technical points, including data and control transfer between MPI and Dask, efficient data selection to minimize unnecessary transfers, metadata system for describing data structure and encoding, adaptation between simulation and analytics data structures, as well as performance at scale. This work is part of the EoCoe-III projet that will deploy at large scale in situ analytics for the Gysela and Parflow solvers.

Bridging Energy Research and High Performance Computing: a Joint EERA-CASTIEL2 Exchange Event

2025/07/04 13:45

02 Jul 2025 The European Energy Research Alliance (EERA) and CASTIEL2, representing the network of EuroHPC National Competence Centres and the... Read more →

Podcast about EoCoe for New Supercomputing in Europe

2025/07/03 11:16

🟢 Spotify: https://lnkd.in/dfuqXXY6 🟣 Apple podcasts: https://lnkd.in/djy_nr3a RSS feed to add to your favourite podcast app: https://lnkd.in/dADpkW_c In this episode... Read more →

GYSELA-X Webinar Code of the month

2025/05/22 16:09

GYSELA-X, a code used to model turbulent transport in tokamak plasmas, will be presented during a webinar on 25th June... Read more →

In-situ data processing with DASK: Public Webinar

2025/03/21 09:06

27th March 2025, 10 am In-situ data processing with DASK Webinar organized by CASTIEL2 Abstract: In situ data processing refers to... Read more →

EoCoE-Parflow: Public Webinar

2025/03/21 09:00

26th March 2025, 11 am Code of the Month vol.15: EoCoE-Parflow Organized by CASTIEL2 and EoCoE ParFlow is a parallel,... Read more →

Bridging Energy Research and High Performance Computing: a Joint EERA-CASTIEL2 Exchange Event

2025/07/04 13:45

02 Jul 2025 The European Energy Research Alliance (EERA) and CASTIEL2, representing the network of EuroHPC National Competence Centres and the... Read more →

Podcast about EoCoe for New Supercomputing in Europe

2025/07/03 11:16

🟢 Spotify: https://lnkd.in/dfuqXXY6 🟣 Apple podcasts: https://lnkd.in/djy_nr3a RSS feed to add to your favourite podcast app: https://lnkd.in/dADpkW_c In this episode... Read more →

GYSELA-X Webinar Code of the month

2025/05/22 16:09

GYSELA-X, a code used to model turbulent transport in tokamak plasmas, will be presented during a webinar on 25th June... Read more →

In-situ data processing with DASK: Public Webinar

2025/03/21 09:06

27th March 2025, 10 am In-situ data processing with DASK Webinar organized by CASTIEL2 Abstract: In situ data processing refers to... Read more →

EoCoE-Parflow: Public Webinar

2025/03/21 09:00

26th March 2025, 11 am Code of the Month vol.15: EoCoE-Parflow Organized by CASTIEL2 and EoCoE ParFlow is a parallel,... Read more →