Loading…

Identifying Semantic Outliers of Source Code Artifacts and Their Application to Software Architecture Recovery

Understanding software architecture is essential to software maintenance. There has been much effort to derive software architecture views from source code artifacts. Typically, along with structural information, the semantic information derived from an identifier name and comments are helpful. Howe...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020, Vol.8, p.212467-212477
Main Authors: Lee, Ki-Seong, Lee, Chan-Gun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Understanding software architecture is essential to software maintenance. There has been much effort to derive software architecture views from source code artifacts. Typically, along with structural information, the semantic information derived from an identifier name and comments are helpful. However, because code vocabulary choice depends on a developer's subjective decision, some source code may have semantically low text quality, leading to an inaccurate architecture recovery. This paper aims to improve the architecture recovery of a software system by identifying and removing the semantic outliers of source code artifacts. Accordingly, we propose a novel measure Conceptual Conformity (CC), which computes the similarity between two latent topic distributions obtained from both the source code and its package. We use CC to identify source code that is not relevant to the package's semantic context and define it as a semantic outlier. Because the semantic outliers may cause inaccurate architecture recovery, we remove them during the recovery process. We apply our approach to three open-source projects. The results demonstrate that, for projects with low recovery performance, removing outliers leads to higher recovery accuracy.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3040024