Loading…
Learning to detect community smells in open source software projects
Community smells are symptoms of organizational and social issues within the software development community that often lead to additional project costs. Recent studies identified a variety of community smells and defined them as sub-optimal patterns connected to organizational-social structures in t...
Saved in:
Published in: | Knowledge-based systems 2020-09, Vol.204, p.106201, Article 106201 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Community smells are symptoms of organizational and social issues within the software development community that often lead to additional project costs. Recent studies identified a variety of community smells and defined them as sub-optimal patterns connected to organizational-social structures in the software development community. To early detect and discover existence of potential community smells in a software project, we introduce, in this paper, a novel machine learning-based detection approach, named csDetector, that learns from various existing bad community development practices to provide automated support in detecting such community smells. In particular, our approach learns from a set of organizational-social symptoms that characterize the existence of potential instances of community smells in a software project. We built a detection model using Decision Tree by adopting the C4.5 classifier to identify eight commonly occurring community smells in software projects. To evaluate the performance of our approach, we conduct an empirical study on a benchmark of 74 open source projects from Github. Our statistical results show a high performance of csDetector, achieving an average accuracy of 96% and AUC of 0.94. Moreover, our results indicate that the csDetector outperforms two recent state-of-the-art techniques in terms of detection accuracy. Finally, we investigate the most influential community-related metrics to identify each community smell type. We found that the number of commits and developers per time zone, the number of developers per community, and the social network betweenness and closeness centrality are the most influential community characteristics. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2020.106201 |