Loading…

A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows

Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck f...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hoffmann, Nils, Ebrahimi Pour, Neda
Format:	Conference Proceeding
Language:	English
Subjects:	Computational fluid dynamics Computational Fluid Dynamics (CFD) Computational modeling Deep learning Machine Learning Metadata Provenance Runtime Training Training data
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.
ISSN:	2768-0657
DOI:	10.1109/EuroSPW61312.2024.00092