Loading…

ModelSet: A labelled dataset of software models for machine learning

Curated collections of models are essential for the success of Machine Learning (ML) and Data Analytics in Model-Driven Engineering (MDE). However, current datasets are either too small or not properly curated. In this paper, we present ModelSet, a dataset composed of 5,466 Ecore models and 5,120 UM...

Full description

Saved in:
Bibliographic Details
Published in:Science of computer programming 2024-01, Vol.231, p.103009, Article 103009
Main Authors: López, José Antonio Hernández, Cánovas Izquierdo, Javier Luis, Cuadrado, Jesús Sánchez
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Curated collections of models are essential for the success of Machine Learning (ML) and Data Analytics in Model-Driven Engineering (MDE). However, current datasets are either too small or not properly curated. In this paper, we present ModelSet, a dataset composed of 5,466 Ecore models and 5,120 UML models which have been manually labelled to support ML tasks. We describe the structure of the dataset and explain how to use the associated library to develop ML applications in Python. Finally, we present some applications which can be addressed using ModelSet. Tool Website: https://github.com/modelset •We propose a labelled dataset of software models.•The dataset contains 5,466 Ecore models and 5,120 UML models.•We provide a Python library to facilitate applying ML techniques for MDE.•The software is available at https://github.com/modelset.
ISSN:0167-6423
1872-7964
DOI:10.1016/j.scico.2023.103009