Loading…
Establishing MLOps for Continual Learning in Computing Clusters
In our exploration of the evolving behavior of a computing cluster, we focus on building an MLOps continual learning capability to support a machine learning research and development project at Jefferson Lab, an organization focused on basic scientific research. Here, we describe a composable ML wor...
Saved in:
Published in: | IEEE software 2024-07, p.1-8 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In our exploration of the evolving behavior of a computing cluster, we focus on building an MLOps continual learning capability to support a machine learning research and development project at Jefferson Lab, an organization focused on basic scientific research. Here, we describe a composable ML workflow, a custom CGroupV2 exporter, and the implementation of Prometheus, MLFlow, and Grafana. In addition to supporting versioning, monitoring, and comparison, this integrated system also facilitates the delivery of models adapting to the dynamic nature of a computing cluster. |
---|---|
ISSN: | 0740-7459 1937-4194 |
DOI: | 10.1109/MS.2024.3424256 |