Loading…
Identifying Semantic Outliers of Source Code Artifacts and Their Application to Software Architecture Recovery
Understanding software architecture is essential to software maintenance. There has been much effort to derive software architecture views from source code artifacts. Typically, along with structural information, the semantic information derived from an identifier name and comments are helpful. Howe...
Saved in:
Published in: | IEEE access 2020, Vol.8, p.212467-212477 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Understanding software architecture is essential to software maintenance. There has been much effort to derive software architecture views from source code artifacts. Typically, along with structural information, the semantic information derived from an identifier name and comments are helpful. However, because code vocabulary choice depends on a developer's subjective decision, some source code may have semantically low text quality, leading to an inaccurate architecture recovery. This paper aims to improve the architecture recovery of a software system by identifying and removing the semantic outliers of source code artifacts. Accordingly, we propose a novel measure Conceptual Conformity (CC), which computes the similarity between two latent topic distributions obtained from both the source code and its package. We use CC to identify source code that is not relevant to the package's semantic context and define it as a semantic outlier. Because the semantic outliers may cause inaccurate architecture recovery, we remove them during the recovery process. We apply our approach to three open-source projects. The results demonstrate that, for projects with low recovery performance, removing outliers leads to higher recovery accuracy. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2020.3040024 |