Loading…

Enabling Collaborative Data Science Development with the Ballet Framework

While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framewo...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ACM on human-computer interaction 2021-10, Vol.5 (CSCW2), p.1-39, Article 431
Main Authors: Smith, Micah J., Cito, Jürgen, Lu, Kelvin, Veeramachaneni, Kalyan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, the first lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to software and ML performance validation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss implications for future designers of collaborative projects.
ISSN:2573-0142
2573-0142
DOI:10.1145/3479575