Loading…
Statement-based Memory for Neural Source Code Summarization
Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neur...
Saved in:
Published in: | arXiv.org 2023-07 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Bansal, Aakash Jiang, Siyuan Haque, Sakib McMillan, Collin |
description | Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2841192502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2841192502</sourcerecordid><originalsourceid>FETCH-proquest_journals_28411925023</originalsourceid><addsrcrecordid>eNqNi70KwjAYAIMgWLTvEHAOpF8arTgWxUWXuJfUfoWWJtH8DPr0dvABnG64uwXJQIiCVSXAiuQhjJxz2O1BSpGRo4o6okEbWasDdvSKxvk37Z2nN0xeT1S55B9Ia9chVckY7YePjoOzG7Ls9RQw_3FNtufTvb6wp3evhCE247zaWTVQlUVxAMlB_Fd9Af1MNys</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2841192502</pqid></control><display><type>article</type><title>Statement-based Memory for Neural Source Code Summarization</title><source>Publicly Available Content Database</source><creator>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</creator><creatorcontrib>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</creatorcontrib><description>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Descriptions ; Encoders-Decoders ; Programmers ; Source code ; Subroutines</subject><ispartof>arXiv.org, 2023-07</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2841192502?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Bansal, Aakash</creatorcontrib><creatorcontrib>Jiang, Siyuan</creatorcontrib><creatorcontrib>Haque, Sakib</creatorcontrib><creatorcontrib>McMillan, Collin</creatorcontrib><title>Statement-based Memory for Neural Source Code Summarization</title><title>arXiv.org</title><description>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</description><subject>Coders</subject><subject>Descriptions</subject><subject>Encoders-Decoders</subject><subject>Programmers</subject><subject>Source code</subject><subject>Subroutines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNi70KwjAYAIMgWLTvEHAOpF8arTgWxUWXuJfUfoWWJtH8DPr0dvABnG64uwXJQIiCVSXAiuQhjJxz2O1BSpGRo4o6okEbWasDdvSKxvk37Z2nN0xeT1S55B9Ia9chVckY7YePjoOzG7Ls9RQw_3FNtufTvb6wp3evhCE247zaWTVQlUVxAMlB_Fd9Af1MNys</recordid><startdate>20230721</startdate><enddate>20230721</enddate><creator>Bansal, Aakash</creator><creator>Jiang, Siyuan</creator><creator>Haque, Sakib</creator><creator>McMillan, Collin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230721</creationdate><title>Statement-based Memory for Neural Source Code Summarization</title><author>Bansal, Aakash ; Jiang, Siyuan ; Haque, Sakib ; McMillan, Collin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28411925023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Descriptions</topic><topic>Encoders-Decoders</topic><topic>Programmers</topic><topic>Source code</topic><topic>Subroutines</topic><toplevel>online_resources</toplevel><creatorcontrib>Bansal, Aakash</creatorcontrib><creatorcontrib>Jiang, Siyuan</creatorcontrib><creatorcontrib>Haque, Sakib</creatorcontrib><creatorcontrib>McMillan, Collin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bansal, Aakash</au><au>Jiang, Siyuan</au><au>Haque, Sakib</au><au>McMillan, Collin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Statement-based Memory for Neural Source Code Summarization</atitle><jtitle>arXiv.org</jtitle><date>2023-07-21</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-07 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2841192502 |
source | Publicly Available Content Database |
subjects | Coders Descriptions Encoders-Decoders Programmers Source code Subroutines |
title | Statement-based Memory for Neural Source Code Summarization |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A37%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Statement-based%20Memory%20for%20Neural%20Source%20Code%20Summarization&rft.jtitle=arXiv.org&rft.au=Bansal,%20Aakash&rft.date=2023-07-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2841192502%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28411925023%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2841192502&rft_id=info:pmid/&rfr_iscdi=true |