Loading…

Statement-based Memory for Neural Source Code Summarization

Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neur...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-07
Main Authors: Bansal, Aakash, Jiang, Siyuan, Haque, Sakib, McMillan, Collin
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Bansal, Aakash
Jiang, Siyuan
Haque, Sakib
McMillan, Collin
description Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2841192502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2841192502</sourcerecordid><originalsourceid>FETCH-proquest_journals_28411925023</originalsourceid><addsrcrecordid>eNqNi70KwjAYAIMgWLTvEHAOpF8arTgWxUWXuJfUfoWWJtH8DPr0dvABnG64uwXJQIiCVSXAiuQhjJxz2O1BSpGRo4o6okEbWasDdvSKxvk37Z2nN0xeT1S55B9Ia9chVckY7YePjoOzG7Ls9RQw_3FNtufTvb6wp3evhCE247zaWTVQlUVxAMlB_Fd9Af1MNys</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2841192502</pqid></control><display><type>article</type><title>Statement-based Memory for Neural Source Code Summarization</title><source>Publicly Available Content Database</source><creator>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</creator><creatorcontrib>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</creatorcontrib><description>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Descriptions ; Encoders-Decoders ; Programmers ; Source code ; Subroutines</subject><ispartof>arXiv.org, 2023-07</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2841192502?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Bansal, Aakash</creatorcontrib><creatorcontrib>Jiang, Siyuan</creatorcontrib><creatorcontrib>Haque, Sakib</creatorcontrib><creatorcontrib>McMillan, Collin</creatorcontrib><title>Statement-based Memory for Neural Source Code Summarization</title><title>arXiv.org</title><description>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</description><subject>Coders</subject><subject>Descriptions</subject><subject>Encoders-Decoders</subject><subject>Programmers</subject><subject>Source code</subject><subject>Subroutines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNi70KwjAYAIMgWLTvEHAOpF8arTgWxUWXuJfUfoWWJtH8DPr0dvABnG64uwXJQIiCVSXAiuQhjJxz2O1BSpGRo4o6okEbWasDdvSKxvk37Z2nN0xeT1S55B9Ia9chVckY7YePjoOzG7Ls9RQw_3FNtufTvb6wp3evhCE247zaWTVQlUVxAMlB_Fd9Af1MNys</recordid><startdate>20230721</startdate><enddate>20230721</enddate><creator>Bansal, Aakash</creator><creator>Jiang, Siyuan</creator><creator>Haque, Sakib</creator><creator>McMillan, Collin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230721</creationdate><title>Statement-based Memory for Neural Source Code Summarization</title><author>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28411925023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Descriptions</topic><topic>Encoders-Decoders</topic><topic>Programmers</topic><topic>Source code</topic><topic>Subroutines</topic><toplevel>online_resources</toplevel><creatorcontrib>Bansal, Aakash</creatorcontrib><creatorcontrib>Jiang, Siyuan</creatorcontrib><creatorcontrib>Haque, Sakib</creatorcontrib><creatorcontrib>McMillan, Collin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bansal, Aakash</au><au>Jiang, Siyuan</au><au>Haque, Sakib</au><au>McMillan, Collin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Statement-based Memory for Neural Source Code Summarization</atitle><jtitle>arXiv.org</jtitle><date>2023-07-21</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2841192502
source Publicly Available Content Database
subjects Coders
Descriptions
Encoders-Decoders
Programmers
Source code
Subroutines
title Statement-based Memory for Neural Source Code Summarization
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A37%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Statement-based%20Memory%20for%20Neural%20Source%20Code%20Summarization&rft.jtitle=arXiv.org&rft.au=Bansal,%20Aakash&rft.date=2023-07-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2841192502%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28411925023%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2841192502&rft_id=info:pmid/&rfr_iscdi=true