Loading…

Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis

In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to ret...

Full description

Saved in:
Bibliographic Details
Main Authors: Bo Liu, Liang Wu, Qiuxiang Dong, Yuanchun Zhou
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to retrieve the right program from projects that are written in different languages and not well-developed. Prior work solved this problem by extracting words from programs, which cannot fully exploit the information of source code. In this paper, we propose a novel program retrieval method by extracting the frequent patterns and analyzing their correlations with accompanying text information. The experimental results on large-scale and heterogeneous datasets validate the effectiveness of our proposed approach. The inferred semantics of programs can significantly improve the accuracy of code artifact retrieval.
ISSN:2379-7703
DOI:10.1109/BigData.Congress.2014.120