Loading…

Establishing MLOps for Continual Learning in Computing Clusters

In our exploration of the evolving behavior of a computing cluster, we focus on building an MLOps continual learning capability to support a machine learning research and development project at Jefferson Lab, an organization focused on basic scientific research. Here, we describe a composable ML wor...

Full description

Saved in:
Bibliographic Details
Published in:IEEE software 2024-07, p.1-8
Main Authors: McSpadden, Diana, Jones, Mark, Mohammed, Ahmed Hossam, Hess, Bryan, Schram, Malachi
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In our exploration of the evolving behavior of a computing cluster, we focus on building an MLOps continual learning capability to support a machine learning research and development project at Jefferson Lab, an organization focused on basic scientific research. Here, we describe a composable ML workflow, a custom CGroupV2 exporter, and the implementation of Prometheus, MLFlow, and Grafana. In addition to supporting versioning, monitoring, and comparison, this integrated system also facilitates the delivery of models adapting to the dynamic nature of a computing cluster.
ISSN:0740-7459
1937-4194
DOI:10.1109/MS.2024.3424256