Loading…

Automatic Classification of Recurring Tasks in Software Development Projects

Background: Information about project tasks stored in Issue tracking systems (ITS) can be used for project analytics or process simulation. Such tasks can be categorized as stateful or recurrent. Although automatic categorization of stateful tasks is relatively simple, performing the same task for r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wysocki, Wlodzimierz, Ochodek, Miroslaw
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy automatic classification Machine learning Machine learning algorithms Pipelines Project management Software Software algorithms Software development management software project tasks classification Training Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background: Information about project tasks stored in Issue tracking systems (ITS) can be used for project analytics or process simulation. Such tasks can be categorized as stateful or recurrent. Although automatic categorization of stateful tasks is relatively simple, performing the same task for recurrent tasks constitutes a challenge. Aims: The goal of this study is to investigate the possibility of employing machine-learning algorithms to automatically categorize recurrent tasks in software projects based on information stored in ITS. Method: We perform a study on a dataset from six industrial projects containing 9,589 tasks and augment it with an additional dataset of 91,145 task descriptions from other industrial projects to up-sample minority classes during training. We perform ten runs of 10-fold cross-validation for each project and evaluate classifiers using a set of state-of-the-art prediction quality metrics, i.e., Accuracy, Precision, Recall, F1-score, and MCC. Our machine-learning pipeline includes a Transformer-based sentence embed-der ('mxbai-embed-large-vl') and XGBoost classifier. Results: The model automatically classifies software process tasks into 14 classes with MCC ranging between 0.69 and 0.88 (mean: 0.77). We observed higher prediction quality for the largest projects in the dataset and those managed according to "traditional" project management methodologies. Conclusions: We conclude that machine-learning algorithms can effectively categorize re-current tasks. However, it requires collecting a large balanced dataset of ITS tasks or using a pre-trained model like the one provided in this study.
ISSN:	2376-9521
DOI:	10.1109/SEAA64295.2024.00076