Loading…
A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows
Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck f...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 573 |
container_issue | |
container_start_page | 567 |
container_title | |
container_volume | |
creator | Hoffmann, Nils Ebrahimi Pour, Neda |
description | Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training. |
doi_str_mv | 10.1109/EuroSPW61312.2024.00092 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10628755</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10628755</ieee_id><sourcerecordid>10628755</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</originalsourceid><addsrcrecordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><source>IEEE Xplore All Conference Series</source><creator>Hoffmann, Nils ; Ebrahimi Pour, Neda</creator><creatorcontrib>Hoffmann, Nils ; Ebrahimi Pour, Neda</creatorcontrib><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><identifier>EISSN: 2768-0657</identifier><identifier>EISBN: 9798350367294</identifier><identifier>DOI: 10.1109/EuroSPW61312.2024.00092</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational fluid dynamics ; Computational Fluid Dynamics (CFD) ; Computational modeling ; Deep learning ; Machine Learning ; Metadata ; Provenance ; Runtime ; Training ; Training data</subject><ispartof>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, p.567-573</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,778,782,787,788,27908,54538,54915</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10628755$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><title>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)</title><addtitle>EUROSPW</addtitle><description>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</description><subject>Computational fluid dynamics</subject><subject>Computational Fluid Dynamics (CFD)</subject><subject>Computational modeling</subject><subject>Deep learning</subject><subject>Machine Learning</subject><subject>Metadata</subject><subject>Provenance</subject><subject>Runtime</subject><subject>Training</subject><subject>Training data</subject><issn>2768-0657</issn><isbn>9798350367294</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tKAzEYRqMgWGrfQDAvMPXPPVkOpV5gpAUrXbgo6SSxsdNJyfRC394RXX2LczjwIfRAYEwImMfpMaf3-VISRuiYAuVjADD0Co2MMpoJYFJRw6_RgCqpC5BC3aJR1333GqPAAfQAfZa4Smc8O_m88dbhcr_PydYbHFLG5fGQdvYQa9s0F7zItt7G9gvPczr51ra1x7HFb70dW48rb3P7i5cpb0OTzt0dugm26fzof4fo42m6mLwU1ez5dVJWRSQgD8UahJNKgw4O1NopbYgWznNCgqWhliwQyq3n2mnN-q81E1oZwTk4GowLbIju_7rRe7_a57iz-bLq21QrIdgPe9lVew</recordid><startdate>20240708</startdate><enddate>20240708</enddate><creator>Hoffmann, Nils</creator><creator>Ebrahimi Pour, Neda</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240708</creationdate><title>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</title><author>Hoffmann, Nils ; Ebrahimi Pour, Neda</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational fluid dynamics</topic><topic>Computational Fluid Dynamics (CFD)</topic><topic>Computational modeling</topic><topic>Deep learning</topic><topic>Machine Learning</topic><topic>Metadata</topic><topic>Provenance</topic><topic>Runtime</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Hoffmann, Nils</creatorcontrib><creatorcontrib>Ebrahimi Pour, Neda</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hoffmann, Nils</au><au>Ebrahimi Pour, Neda</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows</atitle><btitle>2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)</btitle><stitle>EUROSPW</stitle><date>2024-07-08</date><risdate>2024</risdate><spage>567</spage><epage>573</epage><pages>567-573</pages><eissn>2768-0657</eissn><eisbn>9798350367294</eisbn><coden>IEEPAD</coden><abstract>Computational Fluid Dynamics (CFD) simulations are essential in various engineering applications. The use of high-performance computing has significantly expanded the scope of realizable models. However, balancing reasonable time-to-solution expectations with solution accuracy remains a bottleneck for many large-scale simulations. Machine learning (ML) algorithms have gained increasing popularity in the CFD community. Various data-based analysis methods have been deployed to predict CFD solutions and reduce the computational effort. The growing use of ML methods necessitates ensuring the reproducibility and transparency of data-driven methods and their associated training data processing steps to ensure reliability and trustworthiness of predictions. This paper proposes a new method for capturing provenance or lineage data during ML model training while minimizing development overhead by introducing tooling built on the commonly used data pipeline mechanism. To demonstrate the developed tooling, a deep learning model is trained using available CFD simulation data from an engineering test case. We demonstrate that a complete provenance graph of training and test samples can be automatically generated, along with valuable development metadata such as profiling of individual processing steps during model training.</abstract><pub>IEEE</pub><doi>10.1109/EuroSPW61312.2024.00092</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2768-0657 |
ispartof | 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, p.567-573 |
issn | 2768-0657 |
language | eng |
recordid | cdi_ieee_primary_10628755 |
source | IEEE Xplore All Conference Series |
subjects | Computational fluid dynamics Computational Fluid Dynamics (CFD) Computational modeling Deep learning Machine Learning Metadata Provenance Runtime Training Training data |
title | A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T04%3A32%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Low%20Overhead%20Approach%20for%20Automatically%20Tracking%20Provenance%20in%20Machine%20Learning%20Workflows&rft.btitle=2024%20IEEE%20European%20Symposium%20on%20Security%20and%20Privacy%20Workshops%20(EuroS&PW)&rft.au=Hoffmann,%20Nils&rft.date=2024-07-08&rft.spage=567&rft.epage=573&rft.pages=567-573&rft.eissn=2768-0657&rft.coden=IEEPAD&rft_id=info:doi/10.1109/EuroSPW61312.2024.00092&rft.eisbn=9798350367294&rft_dat=%3Cieee_CHZPO%3E10628755%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-b05d67808fd07bd789185de411fa2fc63f124ae48d883312c358795440d2f9df3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10628755&rfr_iscdi=true |