Loading…

A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows

Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck f...

Full description

Saved in:
Bibliographic Details
Main Authors: Hoffmann, Nils, Ebrahimi Pour, Neda
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 573
container_issue
container_start_page 567
container_title
container_volume
creator Hoffmann, Nils
Ebrahimi Pour, Neda
description Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.
doi_str_mv 10.1109/EuroSPW61312.2024.00092
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10628755</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10628755</ieee_id><sourcerecordid>10628755</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</originalsourceid><addsrcrecordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><source>IEEE Xplore All Conference Series</source><creator>Hoffmann, Nils ; Ebrahimi Pour, Neda</creator><creatorcontrib>Hoffmann, Nils ; Ebrahimi Pour, Neda</creatorcontrib><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><identifier>EISSN: 2768-0657</identifier><identifier>EISBN: 9798350367294</identifier><identifier>DOI: 10.1109/EuroSPW61312.2024.00092</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational fluid dynamics ; Computational Fluid Dynamics (CFD) ; Computational modeling ; Deep learning ; Machine Learning ; Metadata ; Provenance ; Runtime ; Training ; Training data</subject><ispartof>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&amp;PW), 2024, p.567-573</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,778,782,787,788,27908,54538,54915</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><title>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&amp;PW)</title><addtitle>EUROSPW</addtitle><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><subject>Computational fluid dynamics</subject><subject>Computational Fluid Dynamics (CFD)</subject><subject>Computational modeling</subject><subject>Deep learning</subject><subject>Machine Learning</subject><subject>Metadata</subject><subject>Provenance</subject><subject>Runtime</subject><subject>Training</subject><subject>Training data</subject><issn>2768-0657</issn><isbn>9798350367294</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</recordid><startdate>20240708</startdate><enddate>20240708</enddate><creator>Hoffmann, Nils</creator><creator>Ebrahimi Pour, Neda</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240708</creationdate><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><author>Hoffmann, Nils ; Ebrahimi Pour, Neda</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational fluid dynamics</topic><topic>Computational Fluid Dynamics (CFD)</topic><topic>Computational modeling</topic><topic>Deep learning</topic><topic>Machine Learning</topic><topic>Metadata</topic><topic>Provenance</topic><topic>Runtime</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hoffmann, Nils</au><au>Ebrahimi Pour, Neda</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</atitle><btitle>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&amp;PW)</btitle><stitle>EUROSPW</stitle><date>2024-07-08</date><risdate>2024</risdate><spage>567</spage><epage>573</epage><pages>567-573</pages><eissn>2768-0657</eissn><eisbn>9798350367294</eisbn><coden>IEEPAD</coden><abstract>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</abstract><pub>IEEE</pub><doi>10.1109/EuroSPW61312.2024.00092</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2768-0657
ispartof 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, p.567-573
issn 2768-0657
language eng
recordid cdi_ieee_primary_10628755
source IEEE Xplore All Conference Series
subjects Computational fluid dynamics
Computational Fluid Dynamics (CFD)
Computational modeling
Deep learning
Machine Learning
Metadata
Provenance
Runtime
Training
Training data
title A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T04%3A32%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Low%20Overhead%20Approach%20for%20Automatically%20Tracking%20Provenance%20in%20Machine%20Learning%20Workflows&rft.btitle=2024%20IEEE%20European%20Symposium%20on%20Security%20and%20Privacy%20Workshops%20(EuroS&PW)&rft.au=Hoffmann,%20Nils&rft.date=2024-07-08&rft.spage=567&rft.epage=573&rft.pages=567-573&rft.eissn=2768-0657&rft.coden=IEEPAD&rft_id=info:doi/10.1109/EuroSPW61312.2024.00092&rft.eisbn=9798350367294&rft_dat=%3Cieee_CHZPO%3E10628755%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10628755&rfr_iscdi=true