Loading…

ARCLIN: automated API mention resolution for unformatted texts

Online technical forums (e.g., StackOverflow) are popular platforms for developers to discuss technical problems such as how to use a specific Application Programming Interface (API), how to solve the programming tasks, or how to fix bugs in their code. These discussions can often provide auxiliary...

Full description

Saved in:
Bibliographic Details
Main Authors: Huo, Yintong, Su, Yuxin, Zhang, Hongming, Lyu, Michael R.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Online technical forums (e.g., StackOverflow) are popular platforms for developers to discuss technical problems such as how to use a specific Application Programming Interface (API), how to solve the programming tasks, or how to fix bugs in their code. These discussions can often provide auxiliary knowledge of how to use the software that is not covered by the official documents. The automatic extraction of such knowledge may support a set of downstream tasks like API searching or indexing. However, unlike official documentation written by experts, discussions in open forums are made by regular developers who write in short and informal texts, including spelling errors or abbreviations. There are three major challenges for the accurate APIs recognition and linking mentioned APIs from unstructured natural language documents to an entry in the API repository: (1) distinguishing API mentions from common words; (2) identifying API mentions without a fully qualified name; and (3) disambiguating API mentions with similar method names but in a different library. In this paper, to tackle these challenges, we propose an ARCLIN tool, which can effectively distinguish and link APIs without using human annotations. Specifically, we first design an API recognizer to automatically extract API mentions from natural language sentences by a Conditional Random Field (CRF) on the top of a Bi-directional Long Short-Term Memory (Bi-LSTM) module, then we apply a context-aware scoring mechanism to compute the mention-entry similarity for each entry in an API repository. Compared to previous approaches with heuristic rules, our proposed tool without manual inspection outperforms by 8% in a high-quality dataset Py-mention, which contains 558 mentions and 2,830 sentences from five popular Python libraries. To our best knowledge, ARCLIN is the first approach to achieve full automation of API mention resolution from unformatted text without manually collected labels.
ISSN:1558-1225
DOI:10.1145/3510003.3510158