Loading…

A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows

Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck f...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hoffmann, Nils, Ebrahimi Pour, Neda
Format:	Conference Proceeding
Language:	English
Subjects:	Computational fluid dynamics Computational Fluid Dynamics (CFD) Computational modeling Deep learning Machine Learning Metadata Provenance Runtime Training Training data
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	573
container_issue
container_start_page	567
container_title
container_volume
creator	Hoffmann, Nils Ebrahimi Pour, Neda
description	Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.
doi_str_mv	10.1109/EuroSPW61312.2024.00092
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10628755</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10628755</ieee_id><sourcerecordid>10628755</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</originalsourceid><addsrcrecordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><source>IEEE Xplore All Conference Series</source><creator>Hoffmann, Nils ; Ebrahimi Pour, Neda</creator><creatorcontrib>Hoffmann, Nils ; Ebrahimi Pour, Neda</creatorcontrib><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><identifier>EISSN: 2768-0657</identifier><identifier>EISBN: 9798350367294</identifier><identifier>DOI: 10.1109/EuroSPW61312.2024.00092</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational fluid dynamics ; Computational Fluid Dynamics (CFD) ; Computational modeling ; Deep learning ; Machine Learning ; Metadata ; Provenance ; Runtime ; Training ; Training data</subject><ispartof>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, p.567-573</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,778,782,787,788,27908,54538,54915</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><title>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)</title><addtitle>EUROSPW</addtitle><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><subject>Computational fluid dynamics</subject><subject>Computational Fluid Dynamics (CFD)</subject><subject>Computational modeling</subject><subject>Deep learning</subject><subject>Machine Learning</subject><subject>Metadata</subject><subject>Provenance</subject><subject>Runtime</subject><subject>Training</subject><subject>Training data</subject><issn>2768-0657</issn><isbn>9798350367294</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</recordid><startdate>20240708</startdate><enddate>20240708</enddate><creator>Hoffmann, Nils</creator><creator>Ebrahimi Pour, Neda</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240708</creationdate><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><author>Hoffmann, Nils ; Ebrahimi Pour, Neda</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational fluid dynamics</topic><topic>Computational Fluid Dynamics (CFD)</topic><topic>Computational modeling</topic><topic>Deep learning</topic><topic>Machine Learning</topic><topic>Metadata</topic><topic>Provenance</topic><topic>Runtime</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hoffmann, Nils</au><au>Ebrahimi Pour, Neda</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</atitle><btitle>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)</btitle><stitle>EUROSPW</stitle><date>2024-07-08</date><risdate>2024</risdate><spage>567</spage><epage>573</epage><pages>567-573</pages><eissn>2768-0657</eissn><eisbn>9798350367294</eisbn><coden>IEEPAD</coden><abstract>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</abstract><pub>IEEE</pub><doi>10.1109/EuroSPW61312.2024.00092</doi><tpages>7</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2768-0657
ispartof	2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, p.567-573
issn	2768-0657
language	eng
recordid	cdi_ieee_primary_10628755
source	IEEE Xplore All Conference Series
subjects	Computational fluid dynamics Computational Fluid Dynamics (CFD) Computational modeling Deep learning Machine Learning Metadata Provenance Runtime Training Training data
title	A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T04%3A32%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Low%20Overhead%20Approach%20for%20Automatically%20Tracking%20Provenance%20in%20Machine%20Learning%20Workflows&rft.btitle=2024%20IEEE%20European%20Symposium%20on%20Security%20and%20Privacy%20Workshops%20(EuroS&PW)&rft.au=Hoffmann,%20Nils&rft.date=2024-07-08&rft.spage=567&rft.epage=573&rft.pages=567-573&rft.eissn=2768-0657&rft.coden=IEEPAD&rft_id=info:doi/10.1109/EuroSPW61312.2024.00092&rft.eisbn=9798350367294&rft_dat=%3Cieee_CHZPO%3E10628755%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10628755&rfr_iscdi=true