Loading…
Code-Smell Detection as a Bilevel Problem
Code smells represent design situations that can affect the maintenance and evolution of software. They make the system difficult to evolve. Code smells are detected, in general, using quality metrics that represent some symptoms. However, the selection of suitable quality metrics is challenging due...
Saved in:
Published in: | ACM transactions on software engineering and methodology 2014-10, Vol.24 (1), p.1-44 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Code smells represent design situations that can affect the maintenance and evolution of software. They make the system difficult to evolve. Code smells are detected, in general, using quality metrics that represent some symptoms. However, the selection of suitable quality metrics is challenging due to the absence of consensus in identifying some code smells based on a set of symptoms and also the high calibration effort in determining manually the threshold value for each metric. In this article, we propose treating the generation of code-smell detection rules as a bilevel optimization problem. Bilevel optimization problems represent a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, only the optimal solutions to the lower-level problem become possible feasible candidates to the upper-level problem. In this sense, the code-smell detection problem can be treated as a bilevel optimization problem, but due to lack of suitable solution techniques, it has been attempted to be solved as a single-level optimization problem in the past. In our adaptation here, the upper-level problem generates a set of detection rules, a combination of quality metrics, which maximizes the coverage of the base of code-smell examples and artificial code smells generated by the lower level. The lower level maximizes the number of generated artificial code smells that cannot be detected by the rules produced by the upper level. The main advantage of our bilevel formulation is that the generation of detection rules is not limited to some code-smell examples identified manually by developers that are difficult to collect, but it allows the prediction of new code-smell behavior that is different from those of the base of examples. The statistical analysis of our experiments over 31 runs on nine open-source systems and one industrial project shows that seven types of code smells were detected with an average of more than 86% in terms of precision and recall. The results confirm the outperformance of our bilevel proposal compared to state-of-art code-smell detection techniques. The evaluation performed by software engineers also confirms the relevance of detected code smells to improve the quality of software systems. |
---|---|
ISSN: | 1049-331X 1557-7392 |
DOI: | 10.1145/2675067 |