Loading…
Large-Scale Heterogeneous Program Retrieval through Frequent Pattern Discovery and Feature Correlation Analysis
In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to ret...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the era of big data, information retrieval becomes even more challenging since the size of data volume is emerging fast and it is difficult to find the right information from the huge amount of heterogeneous datasets. Especially in software engineering domain, it tends to be more difficult to retrieve the right program from projects that are written in different languages and not well-developed. Prior work solved this problem by extracting words from programs, which cannot fully exploit the information of source code. In this paper, we propose a novel program retrieval method by extracting the frequent patterns and analyzing their correlations with accompanying text information. The experimental results on large-scale and heterogeneous datasets validate the effectiveness of our proposed approach. The inferred semantics of programs can significantly improve the accuracy of code artifact retrieval. |
---|---|
ISSN: | 2379-7703 |
DOI: | 10.1109/BigData.Congress.2014.120 |