Loading…
Automatic Classification of Recurring Tasks in Software Development Projects
Background: Information about project tasks stored in Issue tracking systems (ITS) can be used for project analytics or process simulation. Such tasks can be categorized as stateful or recurrent. Although automatic categorization of stateful tasks is relatively simple, performing the same task for r...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background: Information about project tasks stored in Issue tracking systems (ITS) can be used for project analytics or process simulation. Such tasks can be categorized as stateful or recurrent. Although automatic categorization of stateful tasks is relatively simple, performing the same task for recurrent tasks constitutes a challenge. Aims: The goal of this study is to investigate the possibility of employing machine-learning algorithms to automatically categorize recurrent tasks in software projects based on information stored in ITS. Method: We perform a study on a dataset from six industrial projects containing 9,589 tasks and augment it with an additional dataset of 91,145 task descriptions from other industrial projects to up-sample minority classes during training. We perform ten runs of 10-fold cross-validation for each project and evaluate classifiers using a set of state-of-the-art prediction quality metrics, i.e., Accuracy, Precision, Recall, F1-score, and MCC. Our machine-learning pipeline includes a Transformer-based sentence embed-der ('mxbai-embed-large-vl') and XGBoost classifier. Results: The model automatically classifies software process tasks into 14 classes with MCC ranging between 0.69 and 0.88 (mean: 0.77). We observed higher prediction quality for the largest projects in the dataset and those managed according to "traditional" project management methodologies. Conclusions: We conclude that machine-learning algorithms can effectively categorize re-current tasks. However, it requires collecting a large balanced dataset of ITS tasks or using a pre-trained model like the one provided in this study. |
---|---|
ISSN: | 2376-9521 |
DOI: | 10.1109/SEAA64295.2024.00076 |