Loading…

Investigating the use of lexical information for software system clustering

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of information to understand what a software system implements, and in particular the role of each source file. In this paper we...

Full description

Saved in:
Bibliographic Details
Main Authors: Corazza, A., Di Martino, Sergio, Maggio, V., Scanniello, G.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting lexical information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce lexical information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.
ISSN:1534-5351
2640-7574
DOI:10.1109/CSMR.2011.8