Loading…

A mutual embedded self-attention network model for code search

To improve the efficiency of program implementation, developers can selectively reuse the previously written code by searching the open-source codebase. To date, many code search methods have been proposed to actively push the limit of code search accuracy, where the methods designed using Self-Atte...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of systems and software 2023-04, Vol.198, p.111591, Article 111591
Main Authors: Hu, Haize, Liu, Jianxun, Zhang, Xiangping, Cao, Ben, Cheng, Siqiang, Long, Teng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To improve the efficiency of program implementation, developers can selectively reuse the previously written code by searching the open-source codebase. To date, many code search methods have been proposed to actively push the limit of code search accuracy, where the methods designed using Self-Attention mechanism are particularly promising. However, while existing methods can improve the efficiency to capture textual semantics by attending significant words in the code component unit, they typically fail to capture the structural dependencies between the code components which may produce suboptimal search accuracy. In this paper, we propose a novel Self-Attention model termed MESN-CS which considers both word-level attention and code unit-level attention for code search. MESN-CS not only the attention weight of each word in the code component unit is calculated, but also the weight of the embedding between the code combination units is calculated. To verify the effectiveness of the proposed model, three benchmark models were compared on a large-scale code data and CodesearchNet. The experimental results show that the MESN-CS has better Recall@k, NDCG and MRR performance than baseline methods. the experiments also show that the semantic syntactic information between sequences can be effectively characterized in MESN-CS. •The defects and shortcomings of the existing code search model are analyzed.•The model of MESN-CS is studied.•A experimental analysis to verify the effectiveness of MESN-CS, DeepCS, CARLCS-CNN and SAN-CS in code search was made.
ISSN:0164-1212
1873-1228
DOI:10.1016/j.jss.2022.111591